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ABSTRACT 


Radiologists  miss  about  25-30%  of  all  pulmonary  nodules  smaller  than  1.0  cm.  in  mass  screenings. 
A  system  for  the  automated  detection  of  the  pulmonary  nodule  has  been  designed,  tuned,  and 
tested  on  a  43  chest  radiographs.  The  goal  of  this  system  is  to  aid  the  radiologist  in  locating  a 
pulmonary  nodule  by  indicating  a  few  sites  in  the  radiograph  that  are  most  likely  to  be  nodules. 


Procedurally  driven  image  experts  that  respond  to  specific  types  of  anatomic  fealiues  have  been 
devised  and  are  incorporated  in  a  pattern  recognizer,  which  uses  linear  discriminant  analysis,  to 
classify  the  candidate  nodule  sites.  Candidate  nodule  sites  that  are  not  classified  as  nodules  are 
eliminated  from  the  list  of  sites  that  are  presented  to  the  radiologist  for  inspection. 


This  work  has  demonstrated  that  pattern  recognition  techniques  and  procedurally  dnven  image 
experts  are  capable  of  reducing  the  number  of  candidate  nodule  sites  that  a  radiologist  must  inspect 
from  at  most  17  to  at  most  3  in  order  to  be  99%  confident  of  having  inspected  any  nodule  detected 
by  the  system  that  is  trained  with  37  films.  The  radiologist  musi  be  willing  to  accept  a  film  true 
positive  rate  of  88%  (as  opposed  to  a  film  true  posiuve  rate  of  92%)  for  the  convenience  of  having 
fewer  points  to  inspect. 
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1  -  Statement  of  Problem 


The  fundamental  goal  of  this  work  is  to  improve  the  detection  of  the  pulmonary  nodule  hi  chest 
radiographs.  Despite  improvements  in  radiographic  imaging  technology,  radiologists  are  unahle  to 
detect  approximately  30%  of  all  pulmonary  nodules  smaller  than  1.0  cm  in  mass  screenings.  The 
most  dangerous  type  of  pulmonary  nodule,  the  malignant  lesion,  is  most  difficult  to  detect  in  ns 
earliest  stages.  If  the  radiologist  were  able  to  detect  such  lesions  in  their  early  stages  the  patient 
would  often  have  a  better  prognosis  for  survival  and  the  consequent  treatment  would  be  less  radical. 
Besides  easing  the  radiologists’  workload,  automated  nodule  detection  would  provide  a  means  of 
recording  sites  of  possible  nodules  that  should  be  monitored  in  subsequent  films  and  a  means  for 
evaluating  the  performance  of  radiographic  imaging  processes. 

It  is  known  that  the  human  viewer  has  difficulty  delecting  small  lesions  hi  tliesi  films  Dial  is. 
given  a  film  that  contains  a  small  (0.5  cm.  or  less)  lesion  the  human  viewer  will  often  lad  to  delect 
it.  However,  if  the  lesion  were  pointed-out  it  would  be  recognized.  Il  is  believed  that  Uus  inability 
is  due  to  limiuauons  of  the  human  visual  system  in  detecting  objects  against  a  backgiuimd  of 
(structured)  noise.  The  computer  is  immune  to  influences  from  suuuured  noise,  it  urn  tirelessly 
search  an  entire  radiograph  and  report  the  presence  of  all  small  round  tilings  The  ANDS 
(Automated  Nodule  Detecuon  System)  processes  a  chest  radiograph  and  provides  a  display  of 
sixteen  or  fewer  sites  in  the  chest  radiograph  that  are  most  likely  to  be  a  nodule. 


The  possible  benefit  of  ANDS  is  evident  when  one  considers  that  about  30%  of  all  miO/e  solitaiy 
pulmonary  nodules  go  undetected  in  routine  viewing  of  chest  radiogtaphs  |Garland, 
1959](Yerushalmy,  1951|.  This  limitation  is  presumably  not  due  to  radiographic  technology  but  is 
inherent  in  the  human  observer.  The  human  observer  can  reliably  diagnose  pulmonary  nodule’s  1.0 
cm.  or  larger  in  diameter  but  exhibits  decreasing  proficiency  as  the  nodule  diameter  gets  smaller. 
The  radiograph  is  capable  of  representing  a  nodule  as  small  as  0.3  cm.  which  must  often  go 
undetected  for  nine  or  more  months  until  it  reaches  sufficient  size  to  be  seen  |Gnldmcicr.  1965). 
Since  roentgen  findings  are  usually  present  in  presymptomatie  stages,  their  recognition  at  early 
stages  is  presumably  of  incalculable  benefit  to  the  patient.  'Automated  diagnosis  of  the  pulmoiiuiy 
nodule  offers  the  hope  of  easing  the  radiologists  workload  by  helping  to  limit  the  seaicli  aiea. 


The  specific  goal  of  this  work  is  to  improve  an  automated  nodule  detection  system  that  guides  a 
human  viewer  to  sites  on  a  chest  film  that  are  most  likely  to  be  pulmonary  nodules  and  to  tediiic 
the  number  of  false  positives  that  are  reported  by  that  system.  The  following  processes  comprise 
the  current  ANDS:  1)  photographic  copying  and  digitisation  of  radiograph;  2)  image  processing  to 
enhance  the  appearance  of  the  nodules;  3)  candidate  nodule  detection  and  accumulation  of  votes; 
and  4)  elimination  of  false  posiuves.  The  main  featutes  of  the  ANDS  are;  spline  filleting  to 
subtract  out  background  variation;  candidate  nodule,  lib,  and  vascularity  utiecuon  using  Hough  like 
techniques;  and  discriminant  analysis  to  icduce  the  rate  :J  false  posiuves.  These  are  futther 
discussed  in  Chapter  2.5.  Note  that  foist  pas.u as  it  is  used  with  respect  to  die  ANUS  has  tin- 
following  meaning:  any  non-nodule  that  i«.  considcicd  more  noduU  like  dian  a  nodule  is  a  false 
positive.  This  concept  will  be  clarified  in  the  section  ori  nodule  detection  and  acuimulation  of 
votes.  The  body  of  the  experimental  work  was  to  design  arid  time  the  ANDS.  'I his  work  was  done 
in  four  phases;  optimization  of  photographic/ digital  reproduction  of  die  radiograph;  tv,.  i.auon  of 
four  nodule  delecuon  processes;  parameter  tuning  of  the  chosen  piuos;  and  discmninatioii  of  false 
positives.  These  are  discussed  in  the  Experimental  section.  Jhe  following  section  is  an  inltoduaioii 
of  the  problem  of  nodule  delecuon.  It  is  a  survey  of  image  processing  of  chest  ladiogutphs  and  an 
overview  of  tlte  work  done  in  die  computer  analysis  of  chest  films.  It  serves  as  both  a  jusUlicalio!. 
and  as  a  motivation  for  this  work. 


1.1  -  Occurrence  of  the  Solitary  Pulmonary  Nodule 


The  prevalence  of  the  solitary  pulmonary  nodule  (SPN)  has  been  reported  between  1  pel  1000 
(films  studied)  (Holin,  1959J  and  2  per  1000  (Good,  1958J.T  he  study  by  Holm,  a  community  wide 
survey  in  Cincinnati  (1949),  consisted  of  673,281  films  with  687  diagnosed  as  containing  nodules. 
Over  the  last  four  decades  the  incidence  of  bronchogenic  carcinoma  has  quadrupled  in 
industrialized  countries  |W110,  1965).  Deaths  due  to  bronchogenic  carcinoma  weie:  in  Japan,  1.3 
per  1,000,000  in  1950  and  6.5  per  100,000  in  1960;  in  Great  Britain,  50  per  100,000;  and  in  the  US. 
20  per  100,000  in  1960  |WHO,  1965).  The  likelihood  that  a  nodule  is  malignant  increases  with  Uie 
age  of  the  patient.  Walske  reports  malignancy  in  53%  of  all  cases  aged  50  or  more  years  and  12% 
malignancy  in  patients  under  age  50  (Walske,  1966).  Steele  found  malignancy  in  56%  of  all  cases 
aged  50  or  more  years  ISteele,  1963).  In  a  review  of  25  case  studies  which  involved  a  total  of  1203 
patients,  malignant  lesions  were  found  on  the  average  to  comprise  36.7%  of  all  lesions;  the 
percentage  of  malignant  nodules  varied  in  the  case  studies  between  7%  (Jones  &  Cleave,  1954)  and 
78%  |Aximayer  &  Ehrlich,  1955]  |Davis,  1956|.  Seybold  repoiLs  an  average  malignancy  of  37.8%  in 
his  survey  of  22  case  studies  which  involved  a  total  of  2258  cases;  the  data  ranged  between  7%  () 
of  14  cases  -  Jones  &  Cleave,  1954)  and  55%  (37  of  67  cases  llusfeldt  &  CarLscn,  1950)  |Seybokl. 
1964).  It  should  be  noted  that  the  criteria  for  inclusion  in  each  of  ihese  25  studies  varied  as  did  the 
sex  and  average  age  of  the  patients  included.  Some  studies  took  place  in  vetcians  hospitals  which 
were  predominated  by  older  males  while  others  look  place  in  armed  services  hospitals  which  are 
mostly  comprised  of  younger  males.  Lung  cancer  has  been  shown  io  be  about  four  tunes  more 
prevalent  in  males  (21%  of  all  males  cancers)  than  in  females  (5%  of  all  female  cancels)  |AMN. 
1973). 

The  prevalence  of  the  solitary  pulmonary  nodule  (SPN)  has  been  reported  between  1  per  1 000 
(films  studied)  |Holin,  1959]  and  2  per  1000  |Good,  1958|.The  study  by  llolin,  a  community  wide 
survey  in  Cincinnati  (1949),  consisted  of  673,281  films  with  687  diagnosed  as  containing  nodules. 
Over  the  last  four  decades  the  incidence  of  bronchogenic  carcinoma  has  quadrupled  m 
industrialized  countries  (WHO,  1965).  Deaths  due  to  bronchogenic  carcinoma  were:  in  Japan,  1.3 
per  1,000,000  in  1950  and  6.5  per  100,000  in  1960;  in  Great  Britain,  50  per  100,000;  and  m  die  I'.S. 
20  per  100,000  in  1960  |W1I0,  1965).  The  likelihood  that  a  nodule  is  malignant  increases  with  the 


age  of  the  patient.  Waiske  reports  malignancy  in  53%  of  all  cases  aged  50  or  more  years  and  12% 
malignancy  in  patients  under  age  50  |Walskc,  1966).  Steele  found  malignancy  in  56%  of  all  cases 
aged  50  or  more  years  (Steele,  1963].  In  a  review  of  25  case  studies  which  involved  a  total  of  1203 
patients,  malignant  lesions  were  found  on  the  average  to  comprise  36.7%  of  all  lesion s;  the 
percentage  of  malignant  nodules  varied  in  the  case  studies  between  7%  (Jones  &  Cleave.  1954)  and 
78%  |Axtmayer  &  Ehrlich,  1955J  |Davis,  19561.  Seybold  reports  an  average  malignancy  of  37.8%  m 
his  survey  of  22  case  studies  which  involved  a  tola!  of  2258  cases;  llie  data  ranged  between  7%  (1 
of  14  cases  -  Jones  &  Cleave,  1954)  and  55%  (37  of  67  cases  -  Ilusfcldi  &  (.'arisen,  1950)  (Scyliold, 
1964].  It  should  be  noted  that  the  criteria  for  inclusion  in  each  of  these  25  studies  varied  its  did  the 
sex  and  average  age  of  the  patients  included.  Some  studies  took  place  in  veterans  hospitals  winch 
were  predominated  by  older  males  while  others  took  place  in  armed  services  hospitals  which  arc 
mostly  comprised  of  younger  males.  Lung  cancer  has  been  shown  to  be  about  four  limes  more 
prevalent  in  males  (21%  of  all  males  cancers)  than  in  females  (5%  of  all  female  cancers)  JAMN, 


1.2  -  Appearance  of  the  Pulmonary  Nodule 


A  solitary  nodule  is  a  circumscribed  mass  situated  in  the  substance  of  the  lung,  constituting  the  only 
significant  pathologic  process  in  the  lungs  of  the  patient  being  examined,  and  showing  no  significant 
signs  of  cavitation  [hollowness]  or  obstruction  of  the  airway  [Good,  1963|. 

The  SPN  is  also  called  a  "coin  lesion"  because  of  its  circumscribed  circular  shape.  11ns  is.  howevei, 
a  misnomer  because  the  shape  of  the  nodule  in  fact  spherical  or  ovoid  and  not  flat  and  lound. 

The  SPN  manifests  itself  in  essentially  two  forms,  as  benign  and  malignant  lesions.  Ilieie  aie  only 
two  conditions  under  which  a  nodule  may  be  considered  benign:  the  presence  of  dense  calcification 
or  signs  of  stability  lasung  two  or  more  years  |  rod,  1963|. 

Density,  shape,  size,  locauon,  and  marginauon  (border  characterisucs)  aie  considered  as  possible 
measures  to  aid  the  discrimination  of  nodules  from  other  objects  within  Uie  lungs.  It  is  important 
to  note  that  many  of  the  following  stausucs  refer  to  nodules  which  were  most  likely  diagnosed 
because  they  were  seen  in  chest  radiographs  and/or  because  they  were  resected  (surgically  removed). 
The  following  statistics  which  regard  the  SPN  may  Dot  neccsanly  apply  to  die  general  population  of 
radiographic  images  of  all  SPNs,  especially  small  barely  perceptable  ones.  Many  of  the  following 
statistics  describe  that  population  of  visible  nodules  which  are  1.0  cm  or  largei  and  whose 
benign/malignant  nature  is  known. 

density 

Perhaps  the  most  powerful  discriminating  feature  of  the  SPN  is  optical  density.  Dense  images  of 
nodules  often  indicate  the  presence  of  calcification  -  the  primary  feature  of  the  benign  nodule.  The 
density  of  the  nodule  in  the  film  may  be  used  to  distinguish  the  benign  from  die  malignant  nodule. 
However,  the  presence  of  some  calcium  does  not  indicate  that  a  nodule  is  benign.  Ten  cases  of  280, 
3.7  percent,  of  primary  carcinoma  were  read  as  containing  some  calcium  |Slcc!c,  l%3|.  Dense 
nodules  are  less  likely  to  be  considered  malignant  [Vivas,  1953J  [Steele,  !9f>3|  [Davis,  195f.|,  Small 
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lesions  of  heavy  density  are  less  likely  to  be  malignant  j  Vivas,  1953).  Legions  smaller  than  2.0  uu 
and  dense  are  usually  granulomas  (benign);  lesions  larger  than  2.0  cm  in  diameter  and  not  dense  are 
probably  carcinomas  (Davis,  1956).  Malignancy  is  unlikely  in  a  dense  or  concentrically  calcificu 
nodule  (Steele,  1963).  Siegelman  uses  density  as  a  means  of  discrinunaung  between  calcified- benign 
and  malignant  nodules  in  CT  images.  He  found  that  no  malignant  lesion  in  his  study  lud  a  Cl 
number  greater  than  147  Hounsiicld  unib  [Siegelman,  1%0|.  I  lib  Cl  system  was  can-fully 
calibrated  to  produce  quantitative  results. 


shape 


A  majority  of  carcinomas  are  characterized  by  an  irregular  shape  (Good,  195.3).  |Su-elc. 
Thirty-seven  percent  of  the  carcinomas  studied  were  irregular  or  elongated  and  only  11%  had  shaip 
margins  (Steele.  1963].  Shape  offers  little  discrimination  between  benign  and  malignant  nodules. 


Although  nodules  approximately  3  mm  in  diameter  are  visible,  the  lowii  limn  lui  diagnosis  is 
believed  to  be  1.0  cm  [Goldmeier,  1965),  In  a  study  of  1267  nodules,  714  could  be  measured  either 
radiographically  or  pathologically;  66%  were  5  cm  or  greater,  and  1.26%  were  less  than  1.0  cm  in 
diameter  (Theros,  1977).  A  greater  proportion  of  nodules  was  found  to  be  malignant  as  the  nodule 
diameter  increased  |Hohn,  1959).  Davis  reports  in  his  case  study  that  the  smaller  nodules  aie  most 
likely  to  be  considered  benign  |Davis,  1956).  One  might  infer  from  lire  following  that  small 
malignant  nodules  are  often  not  found.  In  a  review  of  22  case  studies,  Seybold  concludes  that  few 
lesions  greater  than  5  cm  were  benign  and  few  less  than  2  cm  were  malignant  |Seybold,  1%4). 
Holin  reports  that  the  average  size  of  malignant  nodules  in  his  case  study  as  5.2  cm,  and  2.5  cm  as 
the  average  size  for  tuberculosis  nodules  [Holm,  1959).  As  the  size  of  die  bronchogenic  carcinoma 
shadow  increases:  operating  becomes  more  difficult;  post-operauve  mortality  is  highci;  and  the 
overall  prognosis  is  worse  j  Bateson,  1964). 

Goldmeier  hypothesizes  that  small  nodules  may  have  a  lower  visibility  because  nodules  have  a  25 
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mm  outer  shell  of  low  density.  In  order  to  be  visible  the  nodules  must  be  considerably  larger  Ilian 
5.0  mm  (Goldmeier,  1965). 


location 


Tuberculous  granulomas  were  found  predominantly  in  upper  lobes  (Hateson,  1965|;  and  tlnee  umes 
more  frequently  in  the  upper  lobe  than  in  the  lower  [Steele.  19631;  solitary  melasiases  were  found 
predominantly  in  the  lower  lobes,  otherwise  no  other  lesions  showed  parucular  distribution  | Hateson, 
1965).  Davis  found,  in  his  study  of  215  cases,  that  the  distribution  of  nodules  in  the  lobes  was  fairly 
even  with  no  great  distribution  differences  between  benign  and  malignant  nodules  [Davis,  1%5| 
Holin  in  his  community-wide  study  reports  that  61%  of  the  lesions  found  were  in  die  right  lung  and 
39%  were  in  the  left  lung;  more  nodules  were  found  in  the  lateral  (68%)  than  in  the  medial  portions 
of  the  lung  (32%)  |Holin,  1959J.  The  above  results  may  either  represent  reality  or  seive  as  an 
indictment  of  the  proficiency  of  the  human  disgnosLician. 

margination 


Most  radiologists  argue  that  it  is  impossible  to  differentiate  benign  from  malignant  nodules  on  t/u 
basis  of  size  or  margination...  |Davis,  1956).  Of  the  100  solitary  circumscribed  carcinomas  included  in 
Bateson’s  study,  the  shadows  of  71%  had  an  ill-defined  margin,  lie  reports  a  tendency  for  a  highei 
proportion  of  shadows  of  small  carcinomas  to  be  ill-defined  and  the  shadows  of  large  caicinoiiias  to 
be  well-defined.  He  also  reports  that  the  prognosis  is  better  for  patients  with  nodules  with  well 
defined  shadows  [Bateson,  1964],  A  ragged,  fuzzy  edge  and  an  irregulai  outline  are  more  often 
present  in  primary  cancer  [ primary  as  opposed  to  a  metastatic,  or  secondary  manifestation  of 
cancerJ[Seybold,  1964).  In  general,  carcinomas  tend  to  be  less  well-defined;  the  sharpness  of  the 
border  could  not  be  used  to  distinguish  between  benign  and  malignant  |Good.  I963|. 

In  conclusion,  the  most  harmless  nodule  is  perhaps  the  easiest  to  detect  in  its  early  stages;  and  the 
nodule  which  presents  the  greatest  danger  to  the  pauent  is  seemingly  the  most  difficult  to  detect  in 
its  early  stages.  For  example,  a  possible  interpretation  of  Davis’  finding,  that  small  nodules  aie 
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most  likely  to  be  considered  benign,  is  that  malignant  nodules  are  simply  nut  likely  to  be  discovi.cd 
until  they  are  big.  A  similar  interpretation  may  also  be  applied  to  Seybold’s  finding  that  few 
nodules  less  than  2  cm  were  found  to  be  malignant  [Seybolri.  Iv56)  -  perhaps  few  small  malignant 
nodules  could  be  seen.  Malignant  nodules  which  are  claimed  to  have  irregular  shapes  and  ill 
defined  margins  jSeybold,  1964)  |Good,  1963}  art  presumably  more  difficult  iu  delect  because  these 
features  are  presumably  more  difficult  to  detect.  Hide  statistics  which  point  u  a  high  incident  e  if 
malignancy  among  large  nodules  should  serve  as  aa  lachcMiM.  of  die  necessity  to  find  a  means  of 
detecting  malignant  nodules  when  they  are  small. 


Since  the  goal  of  the  nodule  detector  is  to  fad  any  nodule,  no  disuneUon  (that  is.  based  on 
brightness)  is  made  between  benign  and  malignant  aotfcuks.  The  centers  ol  candidate  nodules  (sues 
in  the  chest  film  that  are  most  likely  to  be  noduies)  are  detected  by  a  Hough  like  critic  transiorin. 
The  Hough  urcle  transform  has  been  generalized  to  permit  demon  :  4  a  variety  of  bright  closed 
shapes.  Thus  the  circle  transform  is  able  to  detect  a  variety  of  dosed  shapes  Nodule  appearance 
characteristics  that  are  important  to  human  detection  of  the  pulmonary  nodule  have  been 
incorporated  in  ANDS.  Knowledge  about  the  reJaave  tubbiness  and  iriegtilar  shape  of  the 
pulmonary  nodule  is  embedded  in  the  CN  Hxpen  (a  program  that  uses  pru.edu’  d  knowledge  to 
locale  the  center  of  a  CN).  Appearance  chataeiensucs  of  the  nooule  bord  ci.  and  azimuthal 
uniformity  are  used  to  discriminate  nodules  fiom  the  non-tzaduics  among  <c  CNs  dial  ate  repoited 
by  ANDS.  Global  knowledge,  knowledge  ab.iut  the  uo  ‘uk'  in  rtbU'.n  to  its  environment,  is  also 
used  to  discriminate  nodules  from  non  nodules. 


1.3  -  Detection  of  the  Pulmonary  Nodule  by  the  Human  Viewer 

In  addition  to  the  intrinsic  features  mentioned  in  section  1.2,  the  surrounding  anatomical  conuasi 
greatly  effects  the  visibility  of  a  nodule.  Nodules  of  decreasing  contrast  are  increasingly  difficult  to 
detect  |Kundel,  1979].  Nodules  with  sharper  edges  arc  identified  faster  and  with  greater  frequency 
than  those  with  less  sharp  edges  (Carmody,  1980j.  Nodules  of  decreasing  size  arc  increasingly 
difficult  to  detect.  Detection  accuracy  rates  of  44%  for  1.0  to  1.5  cm.  and  8%  for  0.5  cm  nodules  ate 
reported  (Kelsey,  1977J.  The  effect  of  the  surround  complexity,  i.e.  anatomical  busyness,  is 
suggested  in  the  finding,  that  when  the  same  nodule  was  superimposed  in  various  lung  regions  56% 
of  upper-left  and  29%  of  lower-left  lesions  were  seen  (Kelsey,  1977|.  Note  the  coincidence  of  this 
finding  with  that  of  Steele  -  that  tuberculous  granulomas  are  found  three  umes  more  frequently  in 
the  upper  than  in  the  lower  lobes.  Kruger  el  al„  who  automated  the  classificauon  of  coal  workers' 
pneumoconiosis,  report  that  their  device  correctly  classified  77%  of  the  disease  in  die  lowei  left  and 
81.5%  in  the  upper  left  lung.  This  disparity  between  human  and  automated  detection  in  these  lung 
regions  suggests  that  a  perceptual  rather  than  a  pathological  basis  may  be  responsible  for  these 
findings  and  that  automated  methods  may  exhibit  less  error. 

Viewing  distance  and  brightness  level  have  been  shown  to  effect  nodule  dciceuon.  Shea  i  f  ul.  have 
found  that  the  peak  of  the  VSTF  (Visual  System  Transfer  l'uncuon)  decreases  in  amplitude  ami 
frequency  as  brightness  decreases.  They  propose  that  every  abnormality  has  a  unique  (opUmal) 
viewing  distance  |Shea,  1977(.  Hemmingsson  et  at.  superimposed  2.0  cm  diameter  lesions  in  chest 
films  and  found  that  a  density  difference  of  0.025  to  0.060  between  the  nodule  and  surround  was 
necessary  for  detection.  They  also  found  that  the  density  difference  at  which  a  nodule  is  first 
discernable  is  a  function  of  viewing  distance  and  the  charactensucs  of  the  object’s  border  and 
adjacent  structures.  They  suggest  that  the  optimum  viewing  distance  varies  for  different  lesions 
IHemmingsson,  1975). 

The  parameters  of  the  conspicuity  metric,  K2.  which  was  determined  by  It  eves/  ct  ul.  aie  those 
features  which  distinguish  seen  and  unseen  nodules.  The  features  of  an  undetected- but  visible  nodule 
were  compared  with  those  from  the  same  nodule  which  was  only  detected  in  a  later  film.  'I  fits 
metric  was  found  to  best  distinguish  between  populauons  of  detested  .m.l  ui.iietsxicd  nodules.  I  lus 


metric  has  as  its  parameters:  edge  gradient  and  contour  (steepness  and  moodiness,  lespcctively) 
which  are  represented  in  the  parameter  El,  Edge  Index;  and  surround  complexity.  I  best 
parameters  provided  better  discrimination  between  the  two  populations  of  nodules  than  any  of  the 
following  parameters  tested:  size,  density  difference  between  nodule  and  surround,  and  the  laic  of 
change  of  density  around  the  nodule  border  as  determined  by  the  Laplaeian  (Kevesz.  1977|.  Edge 
gradient  and  edge  uniformity  are  used  by  ANDS  to  discriminate  among  (  Ns.  Ihesc  ate 
incorporated  in  ANDS  as  Edge  Strength  and  Edge  Visibility,  respectively.  1  he  slandaid  deviations 
of  these  measures  are  in  fact  used  in  conjunction  with  other  measures  by  die  pattern  recognize t  to 
classify  CNs. 

Errors  in  detection  of  nodules  may  occur  in  four  leve.  of  the  search  process:  orientation,  seaich. 
recognition,  and  decision  making  jKundel,  19781.  Onentauon  errors  occur  when  the  observer  is 
unfamiliar  with  chest  films  and  cannot  differentiate  abnormal  objects  from  background  featuas.  A 
search  error  occurs  when  an  area  containing  a  nodule  is  overlooked.  When  a  nodule  is  scanned 
over  but  not  recognized,  a  recogmuon  error  occurs.  Decision  making  errors  ocuu  when  an 
ambiguous  figure  is  recognized  but  either  falsely  accepted  or  lejected.  Kundct  el  at  claim  dial 
scanning  errors  account  for  30%  of  all  detection  errors;  recogmuon  errois  25%;  and  decision-making 
errors  45%  among  skilled  observers  (Kundel,  1978], 

Studies  of  film  reader  error  over  the  past  twenty  years  indicate  eiror  rates  (films  missed)  of  25  30%. 
Despite  advances  in  radiograph  technology,  readers  have  not  been  able  to  find  more  nodules. 
Reading  errors  may  be  attributed  to  faulty  processing  of  visual  informauon  which  falls  into  two 
domains:  perceptual  (unconscious  process),  and  cognitive  (conscious  process). 

Spatial  vision  research  perhaps  offers  some  explanations  for  human  limuauons  in  nodule  detection. 
The  importance  of  foveal  vision  in  nodule  detection  suggests  that  high  spatial  frequencies  influence 
detection.  Where,  "...small  objects  like  pulmonary  nodules  can  only  be  perceived  it  they  uic  vhm  to 
the  center  of  the  visual  field.  The  more  complex  the  visual  information,  the  closer  to  the  <  enter  of  the 
visual  field  (e.g.  the  fovea)  a  small  object  must  be  imaged  to  be  perceived  |Kundel,  1978|.  In  order 
for  a  square  wave  to  be  distinguished  from  its  fundamental  sinusoid,  at  least  ils  thud  harmonic  must 
exceed  visibility  threshold  (Campbell,  1968).  Similarly,  an  edge  will  go  undetected  if  us  high 
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frequency  components  are  below  contrast  threshold. 

The  effects  of  structured  noise  on  the  surround  have  a  significant  effect  in  how  humans  detect 
nodular  abnormalities  [Kundel,  1975].  Structured  noise  decreases  the  possibility  of  dclecuon  and 
increases  the  time  for  a  reader  to  make  a  response  |Kundel,  1975].  The  effects  of  stiuound  may  be 
generalized  to  fall  into  two  categories:  overlapping  and  non-overlapping  (Kundel,  1978).  "I he  non- 
overlapping  surround  contributes  to  the  overall  complexity  of  the  image,  acung  as  a  camouflage, 
exerting  its  effect  on  visual  search  rather  than  on  visibility  |Kundcl,  197H|.  An  overlapping 
surround  leads  to  edge  obliteration  and  causes  difficulties  in  both  detection  as  well  as  in  physical 
measurements  (Kundel,  1978).  An  occluding  rib  is  an  example  of  such  a  surround  effect,  l  ovial 
performance  is  inhibited  by  the  presence  of  extra  stimuli  in  the  periphery  as  well  as  in  the  fovea 
itself  (Mackworlh,  1965).  Diminuauon  in  the  abmiy  to  perceive  a  given  spaual  frequency,  e.g.  die 
tHrd  or  higher  harmonics  which  would  characterize  the  edge,  may  be  due  to  lateral  inlubiuon  from 
the  surround.  Adaptation  of  cats  to  a  given  spaual  frequency  has  been  shown  to  raise  die  contrast 
required  to  produce  a  given  response  by  a  factor  of  about  foul  |Movshon.  1979).  The  effect  of 
structured  noise  is  evident  in  the  failure  of  image  processing,  "TV  processing"  according  u>  Kundel, 
to  make  nodules  any  more  visible  |Kundcl,  1968).  Extensive  processing  makes  nodules  more 
conspicuous  if  their  locations  are  known  in  advance,  while  detection  is  more  difficult  if  their 
locations  are  unknown  |Kundel,  1975).  This  may  represent  a  structured  noise  effect  since  processing 
may  increase  the  structured  noise  more  than  u  enhances  the  targei  abnormality  (Kundel.  I975|. 


1.4  -  Previous  Work:  Digital  Processing  and  Analysis  of  Chest  Radiographs 

There  are  two  main  aspects  to  automated  diagnosis  of  radiographs,  the  diagnosis  of  paliciu,  and  the 
acquisition  and  analysis  of  large  amounts  of  data  jHenschkc,  1979|.  The  acquisition  and  analysis  of 
data  preceded  the  analysis  of  image  patterns ,  with  the  former  occurring  in  the  early  lo-rnid  60’s  and 
the  latter  in  the  early  70's.  Diagnosis  of  patterns  in  the  chest  encompasses  measurements  for  tile- 
diagnosis  of  rheumatic  heart  disease  (Hall  et.  al.,  1971);  classification  of  coal  workers’ 
pneumoconiosis  (Hall  et.  al..  1976)  |Jagoe  el.  at.,  197S);  analysis  of  pulmonary  infiltration  (Tully, 
1978);  and  detection  of  the  solitary  pulmonary  nodule  (Mallard,  1973|.  1  he  acquisition  and  analysis 
of  large  amounts  of  radiographic  data,  symptoms,  and  test  results  to  arnve  al  a  diagnosis  has  as  its 
main  stumbling  block  the  inconsistent  interpretation  of  the  data  from  die  radiogtaph  by  a  human 
observer  into  a  form  amenable  to  processing  by  a  computer  |l!cnsc!ikc.  1979|. 

Coding  radiographs  is  perhaps  the  first  recorded  instance  of  automated  diagnosis.  Ibis  involves 
quantifying  aspects  of  the  visual  image  into  numerical  sequences  which  aic  amenable  to  computer 
analysis.  In  this  method  the  radiologist  codes  observauons  for  computer  analysis! l.od wick,  1963|.  It 
was  found  that  the  problem  inherent  in  handing  such  data  is  the  conversion  of  the  visual  data  into 
the  exact  qualitative  and  quantitative  forms  required  by  the  computer  |lodwick,  1963).  Yamamtiia 
et  al.  point  out  similar  difficulues  in  coding  radiographic  findings  as  inconvenient  in  reproducibility 
and  different  readings  from  different  viewers  (Yamamura,  1965).  ’I  hey  conclude  dial  ...the  highly 
complicated  findings  of  the  pulmonary  lesion  are  beyond  the  ability  of  (Hitlern  recognition  of  an 
electronic  computer.  They  are  (best)  left  to  the  management  of  human  brains  |  Yamamura,  I96>| 
Meyers  et  al.  digitized  a  radiograph  using  a  flying  spot  scanner  and  displayed  llie  image  on  an 
oscilloscope;  they  also  displayed  the  denvauve  function  of  the  image.  They  report  dial  die 
radiographic  image  retrieved  from  their  computer  is  "...the  most  informative  image  of  a  portion  of 
the  lung  and  ribs  that  (they)  have  ever  seen."  Furthermore,  to  the  encuuiagemenl  of  computer 
visionaries,  they  predicted  that  digital  analysis  of  mechanically  scanned  radiographs  would  be 
possible  (Meyers,  1963). 

Kundel  et.  al.  suggest  that  digital  image  processing  of  the  chest  radiugiapli  is  necessary  to  reduce 
the  near  30%  false  negative  of  the  human  viewer;  diey  outline  image  processing  techniques  |hundcl. 
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1969}.  Moore  claims  that  image  processing  would  probably  make  a  useful  comubuuon  to  ladiology 
by  clarifying  pictures  of  low  quality  (Moore,  1969).  Moore  predicted  that  image  processing  of 
radiographs  would  permit: 

-removal  of  image  noise 
-correction  for  geometric  distortion 
-elimination  of  non-uniform  brightness, 
and  automated  analysis  techniques  would: 

-search  for  tuberculosis  and  heart  enlargement  in  mass  screenings 

-determine  physiological  age  from  xrays  of  the  hands 

-detect  lesions  in  mammograms 

-predict  time  of  tooth  eruption 

-analyze  angiograms 

-calculate  bone  densities. 

Fourier  filtering  techniques  to  enhance  the  appearance  of  the  pulmonary  nodule  were  evaluated  by 
Ziskin.  He  found  these  techniques  incapable  of  separaung  the  nodule  from  its  surround  |Ziskin, 
1972).  Similarly,  Kundel  found  that  processing  the  radiographic  image  did  not  lead  in  an  increase 
in  nodule  detection  (Kundel,  1975J. 

The  analysis  of  pulmonary  infiltrates  and  classification  of  pneumoconiosis  is  cssenually  a  problem  >.A 
texture  analysis.  Quantitative  texture  measures  are  used  to  distinguish  between  normal  lung, 
alveolar  infiltrates,  and  iniersliua!  infiltrates  with  95%  accuracy  in  the  training  phase,  and  90% 
accuracy  in  the  testing  phase  |  Fully,  1978].  The  image  texture  is  analyzed  using  Ausherman's 
SGLDM  (spatial  grey  level  dependence  method).  This  method  is  cssenually  statistical;  us  measures 
are  based  on  the  probability  of  going  from  a  specific  pixel  value  to  another  specific  value  at  a  go 
point  in  a  textured  image.  Differences  in  image  quality  due  to  exposure  ume  and  development 
conditions  are  eliminated  by  linearly  redistiibuung  the  image  so  that  contrast  is  normalized  and  the 
number  of  grey  levels  is  reduced  -  less  grey  levels  lead  to  greater  accuracy  when  using  the  bt»J  J  )M 
[Tully,  1978). 


1.4  -  Previous  Work:  Digital  Processing  and  Analysis  of  Chest  Radiographs 


Two  systems  for  the  detection  and  classification  of  coal  workers'  pneumoconiosis  have  ken 
described,  one  in  the  U.S.  by  Kruger  et  at.  and  one  in  Great  Britain  by  Jagoc  and  Baton  l  ateral 
laws  enacted  in  1969  require  that  coal  workers  be  regularly  examined  for  pneumoconiosis  Criteria 
for  classifying  the  seventy  of  the  disease  have  been  standardized.  I  he  U.S.  system  employs  opto 
digital  analysis  while  the  U.K.  system  operates  cnurcly  on  digital  images.  I  he  opto  digital  method 
involves  imaging  the  Fourier  spectrum  of  the  radiograph  with  a  laser  and  then  analyzing  llie 
spectrum  with  annular  wedges  to  extract  a  frequency  signature  which  is  dicn  subjected  to  statistical 
classification  using  linear  discriminant  funcuons  |Kruger,  1977).  Jagoc  and  Baton's  method  Cot 
classifying  pneumoconiosis  involves  measuring  die  unevenness  of  the  density  distribution  witlun 
square  grids  3.6mm  on  a  side.  The  diagnoses  by  their  process  have  demonstrated  a  0.88  correlation 
with  those  by  radiologists  (Jagoe,  1975]. 

The  earliest  known  work  on  the  automated  detection  of  the  solitary  nodule  may  be  altnbulcd  to 
D.H.  Ballard  and  J.  Sklansky  JBallard,  1973).  This  vt  irk  involves  image  processing  to  enhance 
detection  of  a  tumor  edge  in  digital  representations  of  chest  radiographs  and  radioisotope  hvci 
scans.  The  detection  of  this  edge  was  d earned  difficult  for  two  reasons:  changes  in  the  image 
density  about  the  perimeter  fo  the  nodule  which  are  caused  by  background  density  giailients;  and 
the  presence  of  ribs  which  may  occlude  the  nodule  [Ballard,  1973|.  The  work  done  by  Ballard  is 
the  foundation  of  the  ANDS  and  this  thesis. 


2  *  Introduction  to  ANDS 


A  chest  radiograph  is  the  input  of  ANDS  which  processes  and  analyzes  the  linage  loi  CNs 
(Candidate  Nodules).  The  output  of  ANDS  is  a  list  of  CN  sites  which  are  displayed  for  icview  by 
a  human  viewer.  Fig.  2.0.1  presents  the  essence  of  ANDS.  fable  2.0.1  illustrates  the  steps  in 
ANDS,  their  inputs,  outputs,  and  effects,  The  purpose  of  the  first  step,  photographic  reduction  and 
digitization  of  the  chest  radiograph  is  to  render  the  14"xl7"  chest  film  into  a  form  amenable  to 
digital  processing.  The  photographic  reducuon  step  is  necessary  because  the  available  image 
digitizer  is  not  capable  of  digitizing  any  image  that  is  larger  titan  10"xl0".  The  goal  of  this  stage  is 
to  achieve  a  linear  mapping  between  opucal  densiues  (in  die  lung  parenchyma,  that  is,  in  the  lung 
tissue)  and  pixel  values  and  to  maintain  the  required  spaual  resoluuon.  In  a  pre  processing  step  the 
background  variation  is  removed  using  a  spline  filter  and  the  conuast  is  enhanced  with  lustogiam 
equalization.  CNs  are  located  using  a  1  lough-like  technique,  which  votes  for  CNs  in  an 
accumulator  array  whose  dimensions  correspond  to  the  image  dimensions,  lhe  peaks  in  the 
accumulator  array  correspond  to  die  locations  of  centers  of  closed  circulai  shapes.  Following  the 
application  of  the  Hough  technique  the  accumulator  array  is  smoothed  by  convolution  with  a 
Gaussian  operator.  This  improves  the  estimate  of  the  center  of  a  CN,  which  is  represented  ns  a 
local  peak.  The  smoothed  accumulator  array  is  searched  for  a  specified  number  til  the  highest 
valued  peaks.  The  locauons  of  these  peaks  correspond  to  the  locauons  of  centers  of  CNs.  l  he 
locations  of  the  nodules  in  the  films  that  were  tested  are  known.  A  metric  has  been  devised  to 
measure  the  performance  the  nodule  detecuon  process.  This  metric  uses  the  list  of  CN  locations 
that  is  produced  by  ANDS  and  the  locations  of  the  known  nodules.  Since  some  of  the  repotted 
CNs  are  obvious  errors,  for  example,  lung  borders  and  nbs  are  common  false  posiuvcs  two 
procedurally  driven  recognition  experts  and  a  technique  for  linear  discriminant  analysis  have  been 
incorporated  in  ANDS  to  reduce  the  false  posiuve  rate.  Each  of  the  stages  in  ANDS  is  described  in 
greater  detail  in  the  following  sections. 


2  -  Introduction  to  ANDS 


INPUT 

PROCESS 

OUTPUT’ 

14“xl7"  chest  film 

Photographic  reduction 

4"itS“  negative  oil  Kodak  Commercial 
film;  size  of  radiograph  is  reduced  to  0  26 
times  original 

4“x5"  negative 

Digitization 

digital  image:  8-bits/piid;  sampled  ai 
100/tni;  optical  densities  ate  converted  to 
pixel  values 

digital  image  of  chest  radiograph 

Spline  filtering  and  histogram 
equal'n 

enhanced  digital  image;  background 
variation  is  removed  and  contrast  is 
enhanced 

spline  filtered  histogram  equalized 
image 

Candidate  nodule  detection 

image  dial  contains  votes  for  locations  of 
CN  centers 

accumulator  image 

Smooth  accumulator  image 

smoothed  image  whose  peaks  represent 
the  locations  of  CN  centers;  the  grouping* 
of  votes  cast  in  the  previous  step  are 
concentrated  about  tlicir  center  of  mas* 

smoothcd-accumulator  image 

Search  smoothed  accumulator  image 

list  ofCN  center  coordinates  ordered  by 
accumulator  value; 

accumulator  list,  spline  filtered  image 

Elimination  of  false  positives 

modified  ordered  list  of  CN  center 
coordinates;  false  positives  reduced 

file  of  known  nodule  locations,  list  of 
locations  of  CN  centos 

Performance  evaluation 

a  report  of  die  number  of  false  positives, 
the  true  positive  tale,  die  CUM.  and  the 
DM 

Title  2.0.1  •  The  inputs  and  outputs  of  ANDS. 


2.1  -  Photographic  Reduction  and  Digitization:  Creation  of  a  Test  Database 


Anterior-posterior  chest  radiographs  which  are  representative  of  the  general  population  of  such  chest 
films  were  obtained  from  Dr.  John  Wandtke  of  the  School  of  Medicine  of  the  Umvetsiiy  of 
Rochester.  The  performance  of  ANUS  was  evaluated  using  these  films.  Hence.  Uiese  films  are 
referred  to  as  the  ANI)S  database. 

Fifty  14"xl7"  chest  radiographs,  44  containing  at  least  one  nodular  abnormality  and  6  noinials. 
were  photographically  reduced  and  digiuzed  The  dimensions  of  the  digitized  images  aie  about 
900x900  pixels.  Fig.  2.1.1  illustrates  the  reproduction  process.  A  Sinar  "C  camera  with  a  240mm 
Xenar  lens  was  used  to  image  each  radiograph  onto  Kodak  Commercial  film.  The  non-luug  area  of 
each  radiograph  was  masked  prior  to  copying  using  exposed  xray  film.  Only  die  lung  areas  were 
imaged  when  copying  the  radiograph,  i.c.  no  light  was  allowed  to  pass  duotigh  the  non  lung  aiea  of 
the  radiograph.  This  was  done  to  reduce  camera/lens  flare  in  order  to  obtain  a  mure  linear  transfer 
from  optical  density  to  pixel  value,  see  Chapter  3.1.  A  10"  Kodak  #1  step  wedge  and  a  in-bat 
target  were  included  when  copying  each  radiograph.  These  provided  means  for  quantitatively 
assessing  the  transfer  of  densities  (tone  reproduction),  and  assuring  that  a  nominal  (as  given  by  the 
Nyqujst  sampling  relation)  spatial  resolution  was  maintained.  Since  the  radiograph  was  digitized  on 
an  Optronics  C4100  rotaung  drum  scanner  at  a  conuguous  sampling  interval  of  100  mici  ins  w.lli 
circular  apertures  of  100  microns  (illumination  and  collecuon).  a  spaual  resohiuon  in  excess  of  the 
nominal  1.25  lp/mm  was  maintained.  The  film  was  developed  in  Kodak  1 1C  110  dcvelopci, 
dilution  D.  for  5  minutes  at  68  ±'/4°F  with  R.l.T.  uay  rock  agUaUon.  I  he  Optionas  was 
calibrated,  using  a  5"  Kodak  #2  step  wedge,  to  provide  maximal  useful  lunge  and  optimal 
discrimination  between  densities  around  2.65,  the  upper  limn  of  the  lung  region  densities  in  die 
photographic  reduction.  See  Appendix  9.1  for  details  on  the  culibrauon  of  die  Optronics  scanner. 

A  statistical  analysis  was  performed  to  determine  the  optimal  exposure  and  flare  condition.  The 
exposure/flare  condition  that  resulted  in  a  statistically  insignificant  second-order  term  in  a  regression 
of  pixel  value  as  function  of  radiograph  density  was  chosen  from  the  9  exposure/fiare  conditions  on 
3  representative  films  tested.  This  exposure/fiare  condition  was  used  when  die  50  films  dial 
constitute  the  database  were  copied.  The  nodule  sizc/age,  and  paiicm-scx/discasc/niinibei-of- 
nodules-per-film  distribuuons  of  the  films  in  the  database  are  illustrated  in  Figs.  2.1.2  and  2.1.3. 
respectively. 
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2.  Development 

HC:110B  @  68  F 
For  5’:  RIT  tray  rock  agitation 


1.  Photographic  Reduction 

Magnification  =  0.26 


3.  Digitization 

Image  is  scanned  on 
Optronics  C-410G 
Sampling  interval  =  100  um 
Sampling  aperture  —  100  um 
Number  of  grey  levels  -  8 

Scanner  calibrated  for 
densities  in  range  .20-2.70 


Image  recorded  on 
Kodak  Commercial  Film 


Camera:  Sinar-C 
Lens:  Xenar  240mm,  f/5.6 
Exposure  lOsec,  f/22 


Tri-bar  target 


Kodak  #2  Step  Tablet 


Light  Table 

Illumination  =  16.000  fc 


Non-lung  Area  is  Masked 


Figure  2.1.1  •  The  photographic  reduction  ami  digitization  process  that  was  used  when  copying  the  $0  chest 
radiographs  that  constitute  the  ANDS  database.  The  M"xl7“  radiograph  is  photographically  reduced  onto 
4"x5"  Kodak  Commercial  film  by  the  cameu/teos  system.  The  exposed  sheet  film  is  developed.  The 
developed  him  jj  digitized  oo  an  Optronics  i  ouung  drum  scanner  to  an  900x900  pixel  image 


Figure  2.1.1  -  The  photographic  reduction  and  digitirauon  process  that  was  used  when  iup>i.it  H«-  t>». >i 

radiographs  that  constitute  the  ANDS  database.  The  M"xl7“  radiograph  it  phoiographitall)  lediued  mm. 
4"x5”  Kodak  Commercial  film  by  the  camera/lem  system.  The  exposed  sheet  film  is  developed-  Hie 
developed  film  is  digitized  on  an  Oj  ironies  rotating  drum  scanner  to  an  900x900  pixel  iin.iee. 


2.1  *  Photographic  Reduction  and  Digitization:  Creation  of  a  Test  Database 
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2.2  -  Preprocessing:  Spline  Filtering  and  Histogram  Equalization 

Spline  filtering  and  histogram  equalization  serve  to  make  the  small  details  of  the  image  moie  visible 
by  subtracting  background  variation.  Spline  filtering  is  similar  to  field  flattening  (I’eatson  it.  ul. |. 
Essentially,  the  low  frequency  components  of  the  image  are  removed  when  spline  Tillering.  A  low 
frequency  approximation  of  the  image  is  made  by  interpolaung  with  H-splines.  Iliis  interpolated 
image  is  subtracted  from  the  original  image.  The  spline  filler  has  three  steps:  lnteipolating  the 
original  image  to  produce  a  two-dimensional  approximauon;  subtracting  the  interpolated  image 
from  the  original  image;  and  expanding  the  contrast  of  the  spline  filleted  image  using  histogram 
equalization.  The  parameter  of  the  spline  filter  is  the  interval  at  which  the  intcrpolatit  points  are 
taken  -  the  knot  spacing.  This  interval  corresponds  to  die  number  of  points  lhai  are  interpolated 
between  knots.  Figure  2.2.3  illustrates  the  effect  of  histogram  equalization  on  an  image  that  was 
filtered  at  two  different  knot  spacings.  As  the  distance  between  die  sampled  knots  decreases  die 
interpolated  image  more  closely  approximates  the  original  image.  More  frequently  sampled  images 
contain  more  high  frequency  content;  this  is  evident  in  Figure  2.2.3. 

The  spline  filter  is  faster  than  the  two-dimension  FIT.  It  requires  O(N)  additions  and  U(N/k) 
multiplications  while  the  FIT  requires  O(NiogN)  additions  and  multiplications,  where  N  is  the 
number  of  pixels  in  the  image  and  k  is  the  knot  spacing,  flic  number  of  real  additions  and 
multiplications  that  are  required  for  the  base-2  FIT  are  |llrigham|: 

Heal  Muluplicauons:  (2y  -  4)N  -  4 
Real  Addiuons:  (3y  -  2)N  -t-  2 

where: 

N  =  number  of  pixels  in  image 
y  =  log2  of  N  (where  N  is  a  power  of  2) 

The  number  of  real  additions  and  multiplications  that  are  required  by  die  spline  lillei  were 
determined  to  be: 

Real  Multiplications:  132  -  (N^  - 1)(86  +  46/k)  +  (n'a  •  l)’(32/k) 

Real  Addiuons:  144  +  (N1*  -  l)2(20/k  +  16/k2  +  4)  -  (N,/4  -  1)(  86  +  12k  -t  46/k) 

where: 

k  =  knot  spacing;  1  <  k  <  N/4. 


Note  that  for  a  two-dimensional  FFT,  twiceas  many  multiplications  and  addiuoasare  required  (lot 
forward  and  reverse  transforms)  as  well  as  at  most  N  multiplications  for  the  filtering  operation. 


B*splines  are  used  to  interpolate  the  spline  image.  The  interpolated  image  is  composed  ol  piecewise 
continuous  polynomials  that  are  essenually  linear  combinations  of  the  B-splme  basis  fiincuons.  Knot 
points  define  the  guiding  polygon,  a  convex  hull  under  which  the  interpolated  (unction  is  i« timed. 
The  variant  diminishing  property  of  the  spline  functions  assures  that  die  mu  i  pointed  func  um  will 
always  lie  beneath  the  convex  hull  that  is  defined  by  the  guiding  polygon,  h -spline  bu  itiViUms 
have  the  property  of  local  support,  which  permits  the  posiuoning  of  the  Knots  to  have  local  cuuuol, 
t.e  Fig.  2.2.1.  1'hal  is,  if  the  position  of  a  knot  were  perturbed  the  sliape  of  die  interpolated 
unction  would  change  only  in  die  vicinity  of  dial  knot.  !  ..  .pl.ne  filter  in  spatially  variant,  unlike 
the  FFT  which  is  spatially  invariant,  due  to  ihe  local  sup;  at  properly  of  die  spline  basis  lun<  lions. 
The  general  equation  for  a  R-spline  curve  is  |Wu  d.  ul.\ 

M  M 

P(u)  =  |x(u),  y(u)|  =  IB,  M(n)V,  s  i:l,iAi<ll)l'  V  v>il 

1=0  i-o 

Where  BjM(u)  is  the  r'-lh  basis  function,  a  compound  polynomial  of  ordei  VI  Tins  polynomial  is 
continuous  up  to  and  including  the  (M-2)-ih  denvauve.  The  degree  of  die  polynomial  is  M  1.  The 
following  equation  is  a  simplification  of  the  above  for  cubic,  M  =  4,  B-spimes;  die  type  useu  in  tins 
work. 


P,(S)  =  |S3S2Sl||Cl|Vj.]ViV)+1V|+2lT 


-1  3-31 

3-6  3  0 

|CJ  =  1/6  .3  0  3  0 

14  10 


where: 


C  =  a  matrix  of  coefficients  of  the  periodic  uniform  B-splme  basis  filiations 
i  =  (0,  1 . m}  where  m+1  is  the  number  of  spans  associated  with  the  guiding  poly  cun 


2.2  -  Preprocessing:  Spline  Filtering  and  Histogram  Fquali/ation 


which  has  m  +  1  sides  and  m+1  vertices  (Vfl . Vm) 

S  =  (u-u,)/(u1+1 -u,);  S  €  10.11. 

Generally.  B-spline  functions  are  used  to  interpolate  conunuous  surfaces,  as  in  conipulti  graphics. 
These  shapes  are  usually  closed  curves.  However,  the  spline  filter  requires  splining  of  an  open 
curve.  For  an  open  B-spline  curve  two  end  veruces,  V.,  and  VM+1,  are  extrapolated.  See  Fig. 
2.2.1  for  an  illustration  of  splining  on  an  open  curve. 


The  equations  of  the  new  ends  are: 

PQ(0)  =  (l/6)(V.j  +  4V0  +  V,) 

PM.1(D  =  (l/6)(VM1  +  4VM  +  VM  +  l). 

While  testing  the  spline  filter  1  noticed  that  the  edges  of  the  lungs  had  a  splotchy  appcaiaiue  and 
the  lung  area  lacked  detail.  Fig.  2.2.2.  This  splotchy  appearance  was  ailnbulcd  to  both  the  elicit  of 
the  discontinuity  at  the  lung  border  and  to  the  effect  of  the  significantly  darker  non  lung  atea  on 
the  interpolated  image.  Pcsumably  this  appearance  is  due  to  the  tnitrpolaied  image  under 
approximating  the  original  at  the  border.  The  amount  of  under  approximation  is  dependent  on  die 
closeness  of  a  knot  to  the  edge  of  the  lung.  The  pixels  values  in  die  non  lung  aiea  aie  set  to  the 
mean  pixel  value  of  the  lung  regions  prior  to  splming;  this  reduces  the  nouced  effect,  I  ig  2.2.2. 

A  splined  image  with  knot  spacing  k  is  generated  in  two  steps,  Fust,  cvciy  k-di  row  of  die  image  is 
splined  and  then  each  column  is  splined.  The  values  used  in  splming  die  columns  aie  those  values 
that  were  interpolated  when  splining  the  rows.  The  endpoints  of  the  rows  and  columns  a/e 
obtained  be  a  weighted  extrapolation  of  the  neighboring  knot  values;  for  example,  for  die  left  side: 
V|-lyJ  =  Vlo.yJ/2  +  V|l.y|/3  +  V|2y|/6  and  for  the  bottom  left  comer:  V|.,  =  ( V||(  |(  v  V|,  ,j  + 

W 1 3  The  region  outside  the  lung  parenchyma,  the  non-hmg  region,  is  set  to  the  mean  value 
of  the  lung  region  prior  to  splining. 


2.2  -  Preprocessing:  Spline  Filtering  and  Histogram  liqualizalion 


Histogram  equalization  is  a  method  of  expanding  the  contrast  of  an  image.  A  cumulative  frequency 
histogram  of  pixel  values,  T(r),  intensity  is  determined  from  the  frequency  distribution  of  pixel 
values,  pf(r),  in  the  following  way: 

T(r)  =  2  p(w) 

w  =  0 

where,  r  €  {0 . maximum  pixel  value}. 

The  histogram  equalization,  E(r),  of  pixel  value  r  is  given  by: 

Fir)  =  (r  -  P  )  /  (I*  -P  )  •  T(r)  /  T(P  )•  |»* 

'  min'  v  max  a  mui'  v'  v 'min' 

where,  Pmin  =  the  minimum  pixel  value  represented  in  the  pixel  value  histogram. 

Pmax  =  the  maximum  pixel  value  represented  in  the  pixel  value  histogram. 

P  =  the  maximum  pixel  value  in  the  histogram  equalized  image. 

In  order  to  facilitate  discussions  of  operauons  on  the  image,  the  lollowing  notation  will  he  used 
throughout: 


r„fx<:X>,  y<:  Y>J 

T  represents  an  image  array  composed  of  mbit  pixels,  X  pixels  per  scanline.  and  'i  scaiilmes;  I  t  { 

A .  Z  };  the  field  <•>  is  optional;  the  domains  of  the  indites  are: 

0  <x<X,x  =  {0 . (X  I)j 

0  <y<  Y.y  =  {0 . (Y-l)} 

and  the  range  of  the  image  array  is: 

0<rnIxo-)<2n,rn[x,y}=  {0 . (2"-l)}. 

flG{*.yJ.Pi . p„) 

f  is  a  function  defined  in  the  domain  of  the  image,  G|x,y], 

Pi .  Pn  are  the  parameters  of  f. 

The  spline  filtered  histogram  equalized  image,  F,  is  produced  from  the  inpm  image.  I.  by  the  spline 
filter  function,  f,  whose  parameter  is  knot  spacing,  k. 

F„|x:X,y:Y|  =  f(Ig|x:X.  yiYJ.  k) 

Knot  spacing,  the  only  parameter  of  die  filter,  was  tuned  to  provide  optimal  detection  of  ilu.  known 


tft*  splinVnaljTT,  °n  *  SP‘,ntf  fillcrcd  «».t  m,., 

the  right  correspond  to  those  on  u  t  left  hu,  *<J  *<S''  "ul  h“u'ti''*"1  lit,  llnjj, 

by  470  pixels  per  sunltnc,  t «  '* "T*”  «  ^  top  «,,*«  4,,  Z 

»<  »  linot  spacing  of  )2(j  *  *  snjCln8  of  5  and  those  at  the  hciimni  hoc  fill 
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2.3  -  Candidate  Nodule  Detection 


Candidate  nodule  detection  has  three  steps:  locating  the  CN  center  with  die  CN  Expcil.  image 
smoothing  to  accumulate  the  votes  for  CN  centers,  and  searching  for  a  specified  number  of  CN 
centers.  The  CN  detector  reports  the  locauons  and  values  of  die  closed  shapes  in  die  image.  I  he 
value  associated  with  the  reported  CN  center  is  a  function  of  us  edge  gradient  magnitude  and  its 
size.  This  value  is  computed  by  a  Hough-like  technique  (Mallard,  1973).  The  essence  of  the  CN 
Expert  is  a  circle  detector  which  uses  eriibedded  knowledge  about  the  appearance  of  a  nodule  like 
shape.  The  knowledge  used  by  the  CN  Expert  is:  that  the  CN  is  a  closed  convex  shape  that  is 
lighter  than  its  surround.  This  knowledge  is  used  by  the  circle  detector  to  determine  the  location  of 
the  center.  A  simple  Hough  circle-center  locator  is  used  in  conjunction  with  image  smoothing  by 
convolution  with  a  Gaussian  function  to  provide  a  robust  CN  detector;  a  is  scnsiuve  to  a  variety  of 
closed  shapes,  not  just  circles. 

Since  the  CN  Expert  is  both  compute  bound  and  operates  on  a  large  (approximately  I  Mbyte) 
image,  it  has  been  designed  to  minimize  tire  size  of  its  resident  set.  When  many  users  aie  on  the 
system,  large  programs  such  as  this  one  are  swapped.  This  causes  the  CN  Expeil  to  iuu  slower. 
The  CN  Expert,  the  Gaussian  smoother,  and  die  image  search  operauons  all  operate  on  hoiizonlal 
scanlines  in  a  window  that  moves  from  die  lop  of  the  image  to  die  bottom.  Only  a  lew  stanlmcs 
are  resident  in  primary  memory  at  a  time.  Essentially,  the  user  specifies  die  number  of  scanlines 
that  are  to  be  resident  in  primary  memory;  these  lines  are  read-m;  the  next  group  (a  specified 
number)  of  scanlines  are  read-in  when  a  scanline  dial  is  above  die  topmost  scanline  in  die  resident 
window  is  accessed;  access  of  pixels  in  scanlines  that  are  below  the  bottommost  scanline  in  the 
window  (in  primary  memory)  is  not  possible.  This  technique  has  been  proven  useful  in  sjieeding 
computauon  time. 

The  spline-filtered,  histogram  equalized  image,  I'D,  is  processed  by  die  CN  center  locator,  t(),  to 
produce  an  image  array,  C(],  that  contains  the  centers  of  proponed  CNs. 

Cjj|x:X/rejo/ufio'i,  y:Y/ resolution]  =  c(F|x:X,  y:Y|,  riuhus,  resolution) 

where  c()  is  s  represented  by  the  following  algorithm: 
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for(  ail  image  points:  y,  x  in  F[j;  if  edge  magnitude  >  T ) 

BEGIN 

ex  =  x  +  cos(  Edge  Angle  )  *  radius; 
cy  =  y  +  sin(  Edge  Angle )  *  radius; 

C(cx,  cy]  *-  Qcx,  cy]  +  1  ;  /*  uicrvtneni  accumulator  array  at  CN  center  V 

END 


where. 

Edge  Angle  -  the  angular  orientation  of  the  edge  at  |x,  y}  as  determined  by  a  Sobcl  edge  optr; m-i. 
radius  -  the  radius  of  the  sought-after  CN;  specified  by  the  user. 

T  -  a  threshold  value. 

radius  -  the  radius  (in  pixels)  of  the  sought-after  nodule. 

resolution  -  an  integer  that  specifies  the  reduction  between  the  dimensions  arid  the  input  image  ami 
the  accumulator. 


The  CN  Expert  maps  the  edges  of  the  lighter  cLsed  shapes  in  the  image  to  peaks  in  C|x.y|.  That 
is.  the  edges  of  a  light  convex  shape  will  cast  votes  via  c()  in  the  vicinity  of  the  center  of  the  shape. 
Convolution  with  a  Gaussian  function  is  used  to  cluster  the  votes  further  about  the  center  1.1  tin 
CN.  An  integer  array  of  weights  is  iniualued  using  the  following  Gaussian  function.  'I  wo 
implementational  features  of  the  circle  detector  are  that  it  operates  on  only  a  few  scanlines  at  a 
time  and  that  it  is  performed  using  integer  weights  to  minimise  floaung  point  oveihead.  A  sparse, 
non-linear,  convolution  is  performed  to  restrict  the  processing  to  points  of  probable  interest,  dial 
is,  the  pixel  value  at  the  center  of  the  convolution  template  must  be  greater  than  a  specified 
threshold  if  the  convolution  is  to  be  performed  at  that  pixel. 


This  array  of  weights  is  used  to  compute  S]6]x:X,  y:Y], 

S16]x:X,  y:  Y]  =  sfC^] x: X,  y:  Y],  radius,  resolution) 

where  s()  is  given  by: 

Si6(x.  y]  =  22C8|a,/?|g|x  a,  y-/f]dad/? 

XY 
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2.3  -  Candidate  Nodule  Detection 


radius  ■  the  radius  of  the  sought-after  nodule  in  the  original  image. 
resolution  -  rescales  radius;  the  input  image  is  already  rescaled. 

g(x-x0:  2*r,  y-y„:  2*r]  = 

exp(-wi(x-x0)2  +  (y-y0)2|/area2),  |x-x0|  <  rand |y  y0|  <  r. 

0,  otherwise. 

Following  accumulator  smoothing  the  highest,  nvotes,  values  and  iheii  cooidinalcs  in  die  siuuitlieO 
accumulator  are  located  and  output  in  a  single  pass  through  the  smoothed  image.  As  die  image  is 
sequentially  searched  for  peaks  a  circular  list  is  maintained.  At  the  head  of  this  list  are  the  value 
and  coordinates  of  the  largest  peak  in  the  image;  these  was  determined  by  the  convolution  program 
during  image  smoothing.  Any  image  value  that  is  greater  than  the  value  of  die  last  item  in  die  list 
is  inserted  in  the  ordered  list  and  the  last  item  is  deleted.  No  insertions  are  made  if  the  coordinates 
are  within  2r  and  if  the  new  peak  value  is  less  than  the  value  of  the  item  alieady  in  die  list.  If  the 
coordinates  for  a  peak  whose  value,  which  is  about  to  be  insetted  in  die  list,  aie  within  2i  of  an 
item  already  in  the  list  and  if  the  new  peak  value  is  greater  than  die  one  already  in  die  list,  that  list 
item  is  deleted  and  reinserted  in  a  posiuon  appropriate  to  die  new  peak  value.  A  ir  x  4r  aiea 
around  each  local  maximum  is  set  to  zero  as  the  peak  value  and  coordinates  are  inserted  in  die  list. 
If  any  local  maximum  (in  the  region  being  sei  to  zero)  is  encountered  that  maximum  and  its 
coordinates  are  entered  in  the  list  instead.  An  accumulator  list,  A,  dial  conuuns  a  specified  number, 
nPts,  of  CNs  is  the  result  of  searching  S||  with  the  search  algorithm,  p. 

A|i]  =  p(S|x,y|,  rifts,  radius) 


where  A[i]  =  <aJ(  a2,  a3> 

0<aj<X  and  0  <  a2  <  Y;  where  X  and  Y  are  the  bounds  ofS,Jx:X,  y:Y|, 

a3 €  sielx'  y]- 

i  =  {0, ....  nPts-1},  where  nPts  is  the  number  of  CNs  in  A, 


2.4  -  Performance  Evaluation 


Whether  or  not  nodules  are  present  in  the  chest  film  the  CN  detector  will  report  a  speulied  number 
of  CNs.  Ideally,  the  detector  should  report  any  nodules  that  are  present  in  the  film  in  the  highest 
positions  of  the  list  of  accumulator  peaks.  That  is,  if  there  are  nodules  in  the  film,  they  should 
occupy  the  topmost  slots  (i.e.  have  the  largest  accumulator  values)  in  the  ordered  list  of  accumulator 
votes.  Quite  often  this  is  not  the  case.  Votes  dial  represent  false  positives  are  often  inleispersed 
among  those  that  represent  actual  nodules  in  the  list  of  CNs.  The  efficacy  of  the  detector  is 
dependent  on  the  position  of  the  actual  nodules  in  the  list  of  CNs.  I  he  cumulative  histogiam 
metric  (CUM)  embodies  the  following  rule:  die  closer  the  volts  for  die  actual  nodules  are  to  the 
top  of  the  list  and  the  closer  their  clustenng,  die  better  the  performance  of  the  dctcclor.  The  true 
positive  and  false  positive  rates  are  used  to  characterize  the  performance  of  the  detector.  I  he  true 
positive  rale,  as  it  is  used  in  this  work,  is  defined  as:  the  percentage  of  known  mxlulcs  dial  is 
detected.  The  notion  of  false  positive,  which  is  somewhat  different  from  die  common  concept,  is: 
the  number  of  non-nodules  that  he  between  the  first  accumulator  point  and  the  position  of  the  last 
detected  nodule  in  the  list.  See  1-ig.  2.4.1  for  an  lliusiradon  of  die  calculations  of  true  and  lalsc 
positive  rates,  and  the  detection  metrics. 

A  CN  is  considered  a  detected  nodule  if  us  coordinates  are  close  (a  definition  ol  close  follows)  to 
those  of  a  known  nodule.  Forty-four  of  die  digitized  films  contain  at  least  one  nodule  (32  ionium 
only  one  nodule,  12  contain  more  than  one  nodule).  All  of  the  films  m  die  ANDS  database  wcie 
obtained  from  Dr.  John  Wandlke  at  Suong  Memorial  Hospital,  lie  specified  the  luenuons  ol  the 
nodules  in  these  films  by  circling  them  on  an  acetate  overlay  which  was  placed  in  icgislcr  with  the 
radiograph.  Later  in  the  computer  vision  lab,  1  specified  the  locauon  of  each  nodule  inti'iai uvcl) 
specified  by  positioning  a  cursor  over  the  the  nodule  in  a  display  of  die  digital  image,  an  overlay 
placed  in  register  with  us  corresponding  chest  film  was  used  lo  guide  tins  specification.  I  he 
locations  of  the  nodules  are  stored  in  the  header  portion  of  RV  (Rochester  Vision)  images  dial  are  4 
or  5  times  reductions  (per  side)  of  die  original  image.  The  ctileiion  dial  is  used  by  the  detecuon 
metric  to  determine  if  a  CN  is  dose  to  the  location  of  a  known  nodule  is: 
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ifl  Dist(accx*scale.  accy'scale,  knownX,  knowrY)  <  Allowablelrror ) 
return  NODUI.L  DETECTED; 


where: 

Dist()  =  Euclidean  distance  between  two  points  in  2 -space. 

scale  =  scaling  between  accumulator  unage  and  image  that  which  contains  cooidinutes  <<l  the 
known  nodules. 

(accX,  accY]  and  (knownX,  knownY|  -  coordinates  of  CN  anti  known  nodules.  respectively. 

AllowableError  =  (radius'scale)  +  ZoomLocEnor. 

radius  =  radius  value  that  was  used  by  die  CN  center  locator. 

ZoomLocError  =  2;  \the  amount  of  error  (in  pixels)  allowed  when  interactively  Uxuitng  the  it  nut  oj 
the  nodule  in  the  reduced  image). 


The  CHM  (Cumulative  Histogram  Metric)  and  the  true  posuve  rate  aie  used  to  assess  the 
performance  of  ANDS.  The  value  of  the  CHM  reflects  the  placement  of  nodules  in  A  Ii  is 
defined  on  JO.  1).  The  CHM  is  the  area  of  the  difference  between  an  ideal  cumulative  Ireuceney 
histogram,  c\  and  trie  experimentally  obtained  cumulative  frequency  histogram,  c,  (derived  1  n  A) 
of  accumulator  votes.  The  abcissa  of  this  histogram,  h,  from  which  the  cumulative  (lequeiuj 
histogram  is  derived  is  the  location  (actual  posiuon)  of  the  detected  nodules  and  the  ordinate  is  t. 
{0.  nNods'1}.  That  is,  the  presence  of  a  nodule  in  A  is  marked  by  a  delta  funcuon  with  measute 
(roughly  similar  to  area)  nNods'1,  where  nNods  is  the  number  of  nodules  that  are  known  to  lie  in 
the  film. 

h*[i]  =  nNods*1,  if  i  <  nNods;  i  =  {0,  ....  nNods-1} 

=  0,  otherwise. 

c*|il  =  2  h‘0] 

i=l 

h(i|  =  nNods*1,  if  A|i)  represents  a  nodule  center;  i  =  {0 nNods- 1} 

=  0,  otherwise. 

c|>l  =  2  hl)| 

J=i 


r 


2.4  -  Performance  Evaluation 


di]  <  c*|i).  Vi 


h*|i|,  c*|il.  h|i),&c|*]€  10. 1J. 


I-astNod 

CIIM  =  (  2  c*(i]  -  c[il)/UstNod 

i=  1 


Fig,  2.4.1a  illustrates  A,  the  list  of  accumulator  votes  (the  stausucs  above  the  list  ul  uuiuiiiilaioi 
values  were  produced  by  the  performance  evaluation  program).  Fig.  2.4.1b  illustrates  the  histogram 
of  accumulator  votes  and  the  ideal  histogram  for  an  image  with  2  nodules,  Fig.  2  4.1c  illustrates  the 
cumulative  histograms  derived  from  the  histograms  in  Fig.  2.4.1b  which  are  used  to  compute  the 
CHM.  A  further  indicauon  of  performance  is  obtained  when  lire  C11M  and  'IT  rate  ate  plotted 
with  the  CHM  as  the  ordinate  and  the  TP  rate  as  the  abcissa,  Fig.  2.4. Id.  Hus  inciuc  DM,  distance 
metric,  is  the  distance  between  the  TP  rate  and  the  CHM  and  die  point  of  ideal  perfuimancc,  |1,  0| 
It  is  a  simple  Euclidean  distance: 

DM  =  sqri((l-CHM)?  +  TP2) 
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0.5370  ;  »»  DISTANCE  Metric  far  /w2b  1 1 1/thes  Is/me  t  r  Ic/preAI  / 191  r5k30/^ifi.  <<<< 

0.5370  ;  >»»  Cum  KtiU  Metric  <C«< 

25.0000  ;  Number  of  false  positives  -  0  of  points  which  ere  not 

;  nodules,  that  lie  between  tke  first  eccumuletor  point  end 
the  last  detected  nodule. 

1.0000  ;  Percentage  of  tbe  2  known  nodules  which  were  detected. 

SO  ;  Number  of  points  in  accumulator. 

1.5000  ;  Percentage  of  position*  te  1-th  group  of  10  accumulator  points. 

1.0000  ;  Percentage  of  positions  te  2-th  group  of  10  accumulator  points. 

1.5000  ;  Percentage  of  positive*  In  3-th  group  of  10  accumulator  points. 

1.0000  ;  Percentage  of  positives  In  4-th  group  of  10  accumulator  points. 

1.0000  ;  Percentage  of  positives  In  5-th  group  of  10  accumulator  points. 

0.  Ac c [ 141 ,  142]  •  9702 

1.  Acc[ 132 ,  200]  <  9504 

2.  Acc[277,  279]  «  9120 

3.  Acc[ 120 ,  222]  >  9996 

4.  Act [353,  117]  ■  9544 

5.  Acc[145,  292]  *  9094 

6.  Ac c [2 73,  200]  «  7640 

7.  Acc[293,  250;  •  7949 

9.  Ac c [305 ,  236  j  •  7948 
9.  Aee[ 146,  190]  <  7948 

10.  Acc[291 ,  285]  -  7618 

11.  AccflOS,  120]  <  7326 

12.  Ace[2S6.  283]  *  7264 

13.  Acc[15S ,  288]  >  7264 

14.  Acc[ 132 ,  215]  «  7232 

15.  Acc[164,  163]  ■  7008 

16.  Acc[369,  221]  ■  8944 

17.  Acc[278,  215]  •  6880 

19.  Acc[368 ,  209]  •  6848 

19.  Acc[ 129 ,  164]  •  6646 

20.  Acc[367,  126]  .  6616 

21.  Ac c[ 130 ,  111]  *  6816 

22.  Aec[ 147 ,  165]  *  6784 

23.  Acc[ 153 ,  127]  *  6658 

24.  Acc[299 ,  137]  »  6624 

25.  Acc[1SB.  33? 1  •  6592 

26.  Acc[  69.  112,  .  6592 

27.  Acc[284,  320]  «  6560 

28.  Acc[124,  241]  «  6560 

29.  Acc[ 163 .  224]  •  6528 

30.  Ace[ 183 ,  352]  •  6400 

31.  Acc[253,  245]  »  9400 

32.  Acc[173,  246]  »  6368 

33.  Acc[  58,  176]  •  6304 


1.0000 

50 

0.5000 

0.0000 

0.5000 

0.0000 

0.0000 


27.  Acc[284,  320]  «  6560 

28.  Aec[124,  241]  «  6560 

29.  Ace[ 163 ,  224]  •  6528 

30.  Acc[ 183 ,  352]  •  6400 

31.  Acc[253,  245]  •  6400 

32.  Acc[173,  246]  »  6366 

33.  Acc[  58.  176]  >  6304 

34.  Acc[321 ,  254]  .  6240 

35.  Ace[297.  261]  •  6240 

36.  Acc[  56,  217]  •  8208 

37.  Acc[304,  177]  •  6206 

38.  Acc[256,  216]  •  6176 

39.  Acc[301,  202]  ■  6176 

40.  Acc[317 ,  187]  ■  6176 

41.  Acc[258 ,  261]  ■  6144 

42.  Acc[367 ,  189]  ■  6144 

43.  Acc[292 ,  151]  ■  8144 

44.  Acc[186,  306]  •  6080 

45.  Acc[ 175 ,  265]  •  6080 

46.  Acc[  99,  187]  •  6080 

47.  Acc[371.  81]  ■  6016 

48.  Ace[  56,  157]  «  5952 

49.  Aee[106,  355]  ■  5920 

Figure  2.4.1a  -  The  list  of  accumulator  peaks  that  is  produced  after  searcluns  Uh-  smoothed  uummilatin 
The  nodules  that  were  detected  are  uidtcated  by  art  asterisk:  2  nodules  arc  known  to  exist  in  the  film  fmiu 
which  this  list  was  derived. 


position  in  accumulate!  list 


2.5  -  Incorporation  of  AI  to  Reduce  the  Number  of  False  Positives 


Artificial  intelligence  techniques  have  been  incorporated  into  ihe  ANDS  with  die  goal  ul  reducing 
the  false  positive  rate.  Fig.  2.5.1  shows  a  display  of  64x64  windows  of  die  tup  16  CNs  in 
Several  of  the  CN  images  are  clearly  not  nodules.  These  CNs  are  false  positives.  A  pattern 
classifier  was  taught  to  recognize  the  following  eleven  classes  of  CNs:  Distinct  Rib  (DR),  Small 
Nodule  on  Rib  (SR),  Small  Vascularity  (S V),  Large  Vascularity  (I.V),  Small  Nodule  (SN),  Medium 
Nodule  (MN),  Large  Nodule  (LN),  Lateral  Border  (LB),  Medial  Border  (MB),  Small  Nodule  on 
Border  (SB),  Nipple  (Nl),  and  Undetermined  (UD).  The  incidences  of  each  of  these  classes  ate 
given  in  Table  2.5.1;  these  were  derived  from  the  classifications  of  all  trained  (that  is,  die 
classifications  were  explicitly  taught)  films. 


CLASS 

%  Ol  Al.l, 
CNS 

Rib* 

7.8 

Small  nodule  on  nb 

0.5 

Small  vascularity* 

17.8 

Large  vascularity* 

8.2 

Small  nodule 

1.1 

Medium  nodule 

1.9 

Large  nodule 

0.4 

Lateral  border* 

9.0 

Medial  border* 

21.9 

Small  nodule  on  border 

0.4 

Nipple 

0.5 

Undetermined* 

30.5 

Table  2.5.1  -  The  incidence  rales  of  CN  classes.  Classes  that  arc  considered  false  imisiiivcs  aie  indicated 
with  an  asterisk.  These  percentages  were  derived  from  all  taughi  CNs;  ihc  CNs  in  all  Mi  films,  each 
processed  at  two  radii  (5  and  10  pixels),  which  were  taughi.  dial  is.  individually  classified  hy  a  trained 
human.  Not  all  CNs  were  explicitly  classified  because  64x64  windows  ccmrrcd  around  die  CN  could  not  he 
made  (because  the  CN  is  too  near  the  image  border),  or  because  the  nodule  statistics  could  nut  be 
computed  (because  the  CN  has  a  strange  appearance).  These  data  come  from  2.750  CNs  The  CNs  that  do 
not  fit  well  into  any  class  are  (aught  as  Undetermined.  Note:  the  pattern  classifier  dues  nui  classify 
nodules  as  Undeternimed;  this  classification  was  instituted  so  that  ambiguous  CNs  would  nut  be  used  to 
train  the  pattern  classifier. 


Figure  25.1  -  The  lop  16  CNs  froth  an  accumulator  list.  Several  of  the  displayed  CNs  ate  not  nodules 
Image  #0  is  a  nodule,  #1  a  rib,  #2  a  medial  bolder,  #1*45  vascularity,  and  #12  a  lataal  books.  The 
classification  of  each  false  positive  is  given  in  the  bottom  of  the  window.  These  are  the  classifications  that 
were  presented  to  BMDP7M  (a  commercial  uatiMical  package  for  linear  discriminant  analysis}  when  tuning 
the  pattern  recognizer. 


Artificial  intelligence  techniques  have  been  incorporated  in  ANDS  in  die  Nodule  I.a/mi  I  lu- 
Nodule  Expert  is  essenuaily  a  pattern  classifier  fDuda,  Hart,  1972]  wid.  a  set  of  clasMficuuuu  ndcs. 
These  rules  determine  if  a  CN  is  to  be  omitted  from  the  list  of  CNs  drat  are  pasemed  m  die 
radiologist.  This  rule  causes  omission  of  everything  Ural  is  not  classified  us  a  nipple  nr  some  kind  of 
nodule.  The  pattern  classifier  uses  features  which  describe  the  appearance  or  a  CN;  d,e  output  „r 

two  vision  experts,  the  Rib  Expert  and  the  Vascularity  txpcu;  and  the  position  of  the  CN  die- 
radiograph  to  classify  the  CN. 


ISe  Rib  Expert  is  based  on  ft,  Hough  techn,quc  for  line  detect™.  Us  tnpu,  »  „„»tv 

windowed  regton  around  tbt  ceme,  of  a  CN.  This  image  is  hislog„m  equal, »,d  and  Munuthrd 
(high  frequency  components  and  noise  are  removed).  The  rib  expert  uses  embedded  knowledge 
about  the  appearance  of  ft,  ,,b  to,  guidance  as  „  aiiempls  ft  rejec,  0,  accepi  the  nnage  as  „  „r  a 
nb.  Salient  features  of  the  sought-after  object  ate  tncotporated  into  this  vrsror,  procedure  I  he 
following  features  that  character  nWrnesr  are  embedded  ,n  the  l(,b  I, pc,  alguru'.-r 


a  rib  is  a  light  object  bounded  by  mo  parallel  edges; 

-  by  convenuon  of  the  Sobel  edge  operator,  the  angular  orientation  of  die  r.l,  utges  ,tn 
separated  by  180  degrees; 


4’ 
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-  the  width  of  the  rib  is  approximately  the  diameter  of  the  sought  alter  nodule; 

-  the  rib  edges  are  approximately  centered  around  the  center  of  the  CN; 

-  the  parallel  rib  edges  are  the  strongest  (gradient  magnitude)  of  all  edges  neai  die  t  N, 

-  these  edges  are  also  the  most  extensive  edges  in  the  (windowed)  image. 

The  Rib  Expert  is  procedural  in  that,  given  the  embedded  descripuun  of  die  rib,  it  ileiales  having 
the  goal  of  accepting  the  window  image  as  a  rib  (segment),  rejetung  it  as  a  rib,  or  (ailing  to  accept 
it  as  a  rib.  Knowledge  about  the  salient  rib  features  is  embedded  in  bodi  die  coutiol  strucluie  and 
the  body  of  the  execuung  statement  of  the  rib  expert.  An  increasing  number  of  image  edges  are 
considered  as  possible  rib  edges  in  the  control  loop.  The  execuung  statement  tests  these  edges  at 
two  levels.  First,  a  test  is  performed  dial  attempts  to  reject  the  CN  as  a  rib.  If  that  test  does  not 
reject  the  image,  the  second  lest  attempts  to  accept  die  CN  as  a  rib.  If  neither  lest  is  passed  then 
the  rib  expert  iterates  further,  considering  more  edges  as  possible  rib  edges.  The  procedural  iteration 
fails  to  accept  an  image  as  a  rib  only  after  the  top  20%  of  all  edges  in  the  image  have  been 
considered  as  possible  rib  edges  and  no  rib  was  yet  detected  in  the  image;  this  is  the  stopping 
condition.  This  stopping  condition  is  based  on  the  nouon  that  die  rib  edges  are  die  strongest  and 
most  extensive  in  the  image. 

A  Hough  transform  for  a  line  is  computed  separately  at  each  iteration  foi  edges  that  may  constitute 
the  top  and  bottom  edges  of  a  rib.  Since  a  rib  is  essenually  horizontal,  the  lop  and  bottom  edges 
are  easy  to  compute.  Each  edge,  with  angular  onentauon  4>,  whose  gradient  magnitude  is  greater 
than  the  specified  threshold  is  considered  in  the  Hough  Uansform,  h,  whose  omput,  ll|t):360, 
p:4*radius|,  represents  lines  that  arc  described  by  the  parametric  equation: 

p  =  (x-x0)cos  (0)  +  (y-yo)sin(0) 


where. 


(Xq,  y0]  is  the  computed  center  of  the  CN, 
e  =  90°  +  <J> 
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This  parametric  line  is  ilusuated  in  Fig.  2.5.2.  For  each  edge,  parametric  distances.  p.  arc  unii|i,.ud 
over  a  small  range  of  0  (±5  ),  and  HjG,  p]  is  incremented  at  the  appropriate  coordinates.  1 1ms. 
the  most  predominant  ltne(s)  in  the  image  will  be  represented  by  the  highest  valued  coordinates  m 
H.  After  all  edges  have  incremented  the  parameter  space,  two  histograms,  h,  ai.d  h().  aie  derived 
from  the  parameter  space  in.ag„;  these  are  frequency  histogiams  of  the  «:iilri...ir  or.ci.tation  ol'  edges. 

h,10]=  2H|0.p),  for  0<O  <180 

p  =  l 


hb|01  =  1  H|0,  p).  for  180  <  0  <  360 

p  =  l 


These  frequency  histograms  are  normalized.  These  normalized  frequency  histograms  a;e  subjected 
to  peak  detection  to  determine  whether  or  not  the  image  is  dial  of  a  rib.  ihe  image  is  reju  ted  as  a 
rib  if  there  is  more  than  one  peak  in  either  histogram  that  is  greater  than  53%.  An  image  ic 
accepted  as  a  rib  only  if  there  is  one  peak  in  each  histogram  greater  dun  55%  and  if  U.ese  peaks  aie 
within  180  ±15°.  A  similar  procedure  is  done  over  p.  If  both  tests,  0  and  p,  arc  passed  then  the 
image  is  accepted  as  that  of  a  nb. 
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Figure  15J  *  The  result  of  running  the  Rib  Expert  on  *  CN  that  is  »  rib.  The  red  and  gtccn  areas  on  the 
left  are  the  Hough  transform  space  for  the  top  and  bottom  edges  of  the  rib.  respectively.  Tire  abscissae  of 
the  histograms  below  each  is  angular  orientation.  The  ribs  are  detected  because  there  are  only  two  such 
peaks  and  the  peaks  are  with  in  180±S°  of  each  other.  The  red  and  gteen  lines  that  are  drawn  along  the 
rib  edges  correspond  to  the  top  and  bottom  rib  edges  that  were  detected  by  the  llough  transform. 


DEMO  OF  RIB  DOW  '  • 

»0  RIB  DETECTED  '  .V  . 

81  :•  THETA  •  151  RHd  -  8  •>  , 
83:  THETA  *  41  RHO  =  \ 

TOP  EDGES  ARE  GREEK 

HISTOERANS  REPRESENT  v  "  ■ 
FREQUENCY  OF  EM  ANGLES  J  •_ 

IF  RIB  IS  DETECTED  ITS ; ■*  ’ 
BORDERS  ARE  OUTUNO  ^  L 


AXES  WE  FREQUENCY  VERSUS  AAifcLt  r 
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In  summary,  the  Rib  Expert  uses  knowledge  of  the  appearance  of  the  nb  to  delta  a  nb.  Hie 
embedded  knowledge  determines  the  process  through  which  ihe  data  are  reduced.  Figs  2  3.3  anil 
2.5.4  illustrate  the  results  of  running  the  Rib  Expen  on  a  nb,  which  is  recognized  as  such,  and  mi  a 
nodule,  which  is  rejected  as  a  nb.  Angular  orie&tauon  and  brightness  constrain  the  nb  edge  to 
which  an  edge  element  might  belong.  The  nb  rimdel.  which  requires  parallel  and  extensive  nb 
edges,  constrains  the  allowed  angular  orientation  and  requires  that  the  preponderant  edges  in  die 
image  fit  the  model  of  rib  edges.  These  appearance  constraints  are  used  in  conjuncuon  with  a  line 
detection  technique  and  some  procedural!)-  embedded  rdes  lo  reduce  rib  detection  to  peak  finding 
in  l-dimensional  arrays.  The  effecuveness  of  the  eipert  depends  on  the  tlireshold  levels  used  in  the 
peak-finding  operations  as  well  as  the  the  sophftucauor.  of  the  embedded  knowledge.  In 
preliminary  tests.  85%  of  all  rib  images  (19  wert  itrvttdj  were  correctly  identified;  no  false  positives 
were  reported. 


figure  25.a  -  The  Vascularity  Espert.  A  back-prajecvcd  Bau^s  ujcsfomi  is  used  »«  ihcjmik  m  lim.,r 
clustering  or  vascular  structures  The  rectangular  regions  near  the  mediastinum  in  boil,  lungs  delimit  the 
areas  dial  would  contain  CNs  that  are  considered  be  -.he  Vasa., am*  Lspen.  The  CNs  ilwi  arc  umudered 
as  candidate  vascular  structures  aie  marked  by  led  dots,  lanes  wheat  unciuaiioiis  correspond  io  iliosc  of 
hypothesized  anatomic  structures  are  drawn  between  candidal*  vascularity  sites.  The  inlcnsv,  value  of  e.«h 
of  these  lines  corresponds  to  the  number  of  vascuiarity  points  its*  lie  on  the  line  and  the  number  of  tines 
that  pass  through  the  candidate  vascularity  site. 


The  Vascularity  Expert  provides  a  measure  of  coltneaniy  of  CNs  irt  a  region  near  die  mtiltasuin.in 
of  both  lungs.  The  assumpuons  here  are  that  branching  vascular  structures  near  die  mediastinum 
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will  present  circular  shadows  where  the  branches  are  imaged  end-on  and  dial  die  blanching 
structures  are  somewhat  linear.  Recianglular  regions  near  the  mediastinum  enclose  die  CNx  dial  are 
considered  in  this  test,  see  Fig.  2.5.5.  A  Hough  uansform  for  a  line  is  computed  for  each  CN  in 
these  regions;  each  region  has  a  separate  transform  space.  The  Hough  transform  is  computed  over 
a  range  of  angles.  This  range  is  constrained  by  the  possible  angular  orientations  of  vascular 
structure  in  each  lung.  Thus,  a  line  (in  Uansform  space)  will  receive  as  many  voles  as  diere  are 
CNs  lying  on  it.  The  results  of  this  uansform  are  dien  back-projected  to  deleunine  a  value  of 
colinearity  for  each  CN.  Each  CN,  as  a  result  of  the  back  Uansform,  is  assigned  a  weight  dial 
corresponds  to  the  total  number  of  points  on  all  lines  (coiinear  CNs)  that  pass  through  it. 

Linear  discriminant  analysis  is  incorporated  in  the  pattern  recogm/er  to  classify  CNs.  IIMlil’/M  (a 
commercial  statistics  package  for  doing  linear  discriminant  analysis,  which  is  available  on  the 
Medical  Center  Compuung  Facility’s  DEC- 10)  was  chosen  to  perform  the  discriminant  analysts. 
There  are  two  aspects  of  the  pattern  recognition  process:  system  training  and  recognition.  In  the 
training  phase  the  pattern  classifier  is  presented  with  feature  vectors  that  typify  die  classes  of  CNs. 
In  this  phase  the  pattern  classifier  develops  a  muluvariale  stausucal  model  of  the  classes  and 
computes  a  linear  function  for  classifying  CN  feature  vectors.  The  pattern  classifier  was  first  taught 
using  295,  24-element,  vectors  (23  feature  values  and  a  classrficauon  value)  from  9  films  (it  was  later 
taught  with  2750  feature  vectors  from  37  films);  HMDP7M  computes  a  set  of  weights,  W|t  and 
constants,  Cj,  which  are  applied  to  a  CN  feature  vector,  x,  to  determine  die  discriminant  weights, 
dj(x),  for  the  i-th  class.  Table  2.5.2  describes  the  CN  features  that  were  input  to  HM1)I’7M. 

dj(x)  =  x'Wj  +  ^ 


The  class  with  the  largest  computed  discriminant  value  is  the  class  to  which  the  CN  belongs.  Un¬ 
set  of  weights,  Wj,  and  constants,  Cj,  that  arc  provided  by  BMDP7M  are  instantiated  m  the  pattern 
classifier  which  is  part  of  the  Nodule  Expert. 


FEATURE 


L. 


Il'TION 


Accumulator  Value8,  37 


represents  the  number  of  votes  that  were  cast  fur  a  CN 


Relative  Medial 
Distance8,  37 


shortest  distance  between  CN  and  medial  border;  normalised  by  the  sum  of  the 
medial  and  lateral  distances 


Relative  Lateral 
Distance8,  37 


shortest  distance  between  the  CN  and  the  lateral  border;  noimalued  by  lire  sum  uf 
hte  medial  and  lateral  distances 


Relaove  Central  distance  between  CN  and  medial  border  midpoint  normalised  tiy  the  distance 

Distance8-  37  between  the  medial  border  midpoint  and  the  tup  or  bottom  sires,  if  die  imduk  u 

“  LC  closer  to  the  top  or  bottom,  respectively 


Vascularity  Weight 


Average37/St.  Dev. 
Pixel  Value 


a  measure  of  colinearily  of  CNs  that  lie  near  the  mediastinum;  tlus  value  n 
proportional  to  the  number  of  nodule  that  be  on  the  briefs)  that  pass  through  a 
given  nodule 

azimuthal  averages  are  computed  for  aS  octants;  thee  statistics  derive  from  tu- 
azimuthal  averages  (average  pixel  value  of  an  arc  in  each  ucunt)  per  extant 


Average8,  37/St.  Dev.  Radius 


statistics  on  pixel-value  boundary  points;  radial  distance  in  cm 


Hough  Radius37 


radius  (in  pixels)  used  by  ANDS  when  searching  for  CNs 


Average/St.  Dev. 

dR  at  Gradient  Boundary 

Average/St.  Dev.37 
dG  at  Gradient  Boundary 

Average/St.  Dev.8-  37 
Edge  Strength 

Average37/St.  Dev.8,  37 
Edge  Visibility 

Avg.37/St.  Dev.  Change 
in  Int/Ext.  Brightness 

Avg./St.  Dev.37  Ratio 
between  Ext/Int. 
brightness 


distance  between  the  gradient  magnitude  peak  (m  an.)  and  50%  of  that  peak  vJuc 
in  a  histogram  of  azimuthaby  averaged  gradient  magnitude,  pei  octant:  relates  to 
edge  contrast 

change  in  y  ' •m:  magnitude  between  peak  gradieiu  magnitude  and  50%  of  U.at 
peal  edge  contrast 


ratio  bc.Acc..  do  and  dR_  (above)  over  all  octants;  relates  to  edge  contrast 


ratio  between  the  maximum  gradient  magnitude  fur  each  octant  and  the  nuunium 
gradient  magnitude  over  all  octants;  describes  tire  uniformity  of  the  edge  gradient 
around  the  CN 

ch a wee'-  i-.terior  and  exterior  brighutess  ai  pixel  value  buundary.  ovur  air 

ocu-. 


ratio  t  e.Aten  aveage  inteior  and  exterior  pixel  value  across  the  pud  value 
bour.oa.'y,  over  all  octants 


Rib  Expert  value37 


value  returned  by  the  rib  expert;  essentially  a  boolean  value 


Table  2.5.2  -  Desaiptions  of  the  23  CN  features  that  were  input  to  UMDH7M  in  die  training  phase  Hie 
super-scripts,  8  and  37,  indicate  which  features  were  used  when  training  die  pattern  classifier  8-  and  37- 
trainings.  respectively. 
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The  nodule  appearance  features  characterize  essentially  two  aspects  of  the  CN:  burliness  ami 
marginaiion.  The  CN  appearance  features  are  computed  using  knowledge  about  the  locations  of  two 
types  of  CN  boundary:  the  pixel  value  and  the  gradient  magnitude  boundaries.  Ihe  pixel  value 
boundary  is  determined  as  a  side  effect  of  having  computed  the  nodule  center.  The  CN  I. xpert 
gives  a  rough  estimate  of  the  center  of  the  CN.  A  center  finder  refines  the  estimate  ol  the  center 
that  is  located  by  the  CN  Expert. 


The  CN  center  finder  locates  8  points  on  the  CN  border.  These  points  arc  45  degrees  apart,  with 
respect  to  the  CN  center.  If  one  of  these  points  cannot  be  computed  then  the  featmes  of  the  CN 
are  not  determined.  Each  pixel  value  boundary  point  represents  the  region  between  die  inside  and 
the  outside  of  the  CN.  Each  point  is  determined  separately.  The  boundary  point  is  essentially  die 
point  of  inflection  of  the  change  in  pixel  value  of  adjacent  pixels  on  a  radial  aim.  This  point  is 
determined  as  the  peak  in  a  cumulative  histogram  of  die  differences  between  adjacent  pixels  (on  die 
radial  arm)  which  are  indexed  by  radial  distance,  see  lug.  2.5.7.  The  radial  aims  ate  the  radii  winch 
divide  the  CN  into  equal-sized  octants;  one  of  die  arms  has  an  angular  orientation  of  0°  with  the 
horizontal,  see  Fig.  2.5.6.  Ihe  average  radius,  changes  between  die  inside  and  outside  brightness  of 
the  nodules,  and  average  pixel  value  are  features  that  are  determined  usmg  this  boundaiy.  These 
essentially  describe  the  light/dark  properues  of  the  CN  and  the  disuncuon  of  the  CN  fimn  the 
surround. 


Figure  2.5.6  -  The  partitioning  of  the  candidate  nodule.  The  CN  is  divided  imu  cspi.il  si/cd  i.n.mu  h>  the 
radial  arms.  The  pixel  value  boundaiy  pouu  is  defmed  on  each  radial  aim.  Ii  delimits  the  inside  ftnm  the 
outside  of  die  nodule  based  on  brightness  considerations.  Hie  gradient  magnitude  boimdao  is  defined  .is  a 
radial  distance  of  an  arc  in  each  octant.  This  boundary  delimits  the  inside  from  the  out-idc  of  the  CN 
based  on  sharpness  considerauons  (per  ociam) 
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Figure  2.5.7  -  Computation  of  the  pixel  value  boundary.  The  pixel  value  boundary  is  a  set  vl  p.mi,  i,n  the 
radial  arms.  Each  boundary  point  is  computed  with  a  peak  finding  procedure,  which  is  illu-.tr uted  ah„cc 
First,  radial  slope  is  determined  along'  a  radial  aim  This  tope  is  defined  as:  slope  =  I|»-«h.  >•  d > |  - 
I[x  +  d*.  y  +  dy],  It  is  derived  in  a  smoother  .mag.  N«m.  Uie  lelauve  cumubuxc  r.ijial  slope  (ihe 
normalized  cumulative  area  under  the  plot  of  radial  slope)  is  computed  Die  location  of  the  pixel  value 
boundary  is  derived  from  Uus  plot.  The  boundaiy  point  i>  defined  as  die  first  p.  ,il.  greoiex  tluit  a 
minimum  threshold,  or  ihe  first  radial  point  that  is  just  greater  than  a  maximum  threshold. 
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radial  distance 


Figure  2.S.8  -  Computation  of  the  gradient  magnitude  boundary.  The  gradient  magnitude  liniiiid.ii>  is 
composed  of  arcs,  one  per  octant,  dial  are  derived  from  peaks  in  a  histogram  of  azimuthal  avctjgcs  of 
gradient  magnitude.  These  arcs  are  illusuaied  at  the  top  left.  Two  octants.  0  and  ).  contain  two  arts.  T tic 
radial  locations  of  these  arcs  correspond  to  the  locations  of  the  (at  most  two)  largest  relative  gtadtent 
magnitude  peaks  for  the  respective  octants  These  peaks  are  pictured  in  the  upper  right  All  the  mhoi 
octants  have  only  one  significant  peak  in  their  gradient  magnitude  histograms  (which  ate  mil  illusuaied). 
since  they  contain  only  one  arc.  A  recursive  procedure  is  used  to  determine  the  most  comM'ii  gradient 
magnitude  boundary.  A  consistent  boundary  is  one  in  wltich  the  total  radial  distance  between  adjacent 
candidate  boundary  arcs  is  minimized.  The  candidate  boundary  is  represented  by  the  graph  on  the  IkMoiii 
left  Nodes  represent  candidate  boundary  arcs.  Edges  represent  the  radial  distance  between  adjacent 
boundary  arcs.  The  recursive  procedure  prunes  arcs  that  du  not  lead  to  a  minimal  cost  path;  this  produces 
a  consistent  gradient  magnitude  boundary. 


The  giadient  magnitude  boundary  is  determined  in  a  similar  fashion  to  the  pixel  value  Ixiundaiy. 
However,  the  gradient  magitude  boundary  (for  each  octant)  is  determined  fiom  a  tuslogi.iin  that  is 
obtained  from  the  azimuthally  averaged  gradient  magnitude  in  each  octant.  I  his  boundary  is 
determined  as  the  minimal  cost  path  through  the  (at  most  two)  peaks  in  the  a/nnuthall>  averaged 
gradient  magnitude  histograms  for  each  octant,  see  Fig.  2.5.8.  That  is,  the  minimal  cost  path  results 
in  a  boundary  in  which  the  radial  boundary  distance  per  octant  is  consistent  with  those  of  its  two 
adjacent  neighbors.  The  gradient  magnitude  boundary  lies  on  or  beyond  the  pixel  value  boundary. 
For  sharp  edges  it  lies  on  the  pixel  value  boundary  and  for  fuzzy  edges  it  lies  radially  beyond  it. 
Edge  strength,  edge  visibility,  and  change  in  edge  sradient  are  features  that  are  determined  using 
the  gradient  value  boundary.  "Inc-  .  :;0e  the  definiuon  of  die  CN  niaigin  or  us 

separateness  from  the  surround. 

In  addition  to  the  nodule  appearance  features,  several  relative  distance  features  wcic  devised  lot  use 
in  the  pattern  recognizer  .  They  complement  the  knowledge  about  local  features  by  adding  global 
knowledge  about  the  relative  position  of  the  nodule  in  the  image,  inesc  relative  distances  aie: 
central  distance  (from  the  middle  of  the  mediai  border,  see  Fig.  2.5.11,  to  the  nodule);  medial 
distance  (from  the  medial  border  to  the  nodule);  and  lateral  distance  (from  the  lateral  border  to  die 
nodule).  The  central  distance  is  normalized  by  the  distance  between  die  medial  center  and  the  tup 
of  the  lung,  or  the  distance  between  the  medial  center  and  the  bottom  of  the  lung,  depending  on 
whether  the  nodule  is  in  the  upper  or  lower  portion  of  the  lung,  respectively  (see  lig.  2  5.11). 

Specific  locations  in  the  iungs  must  be  known  so  that  the  relative  distance  measures  can  be 
determined.  These  loca^ons  are  the  medial  and  lateral  borders,  the  top  and  bottom  apices,  and  the 
medial  midpoint  of  each  lung,  Fig.  2.5.10.  First,  the  lungs  must  be  located  in  die  image,  l  ocating 
the  lungs  is  aided  by  the  facts  that  the  non-lung  area  of  the  image  was  masked  out  when  copying, 
and  that  the  lung  image  has  the  correct  orientation.  The  non-lung  region  is  known  to  contain  pixel 
values  that  are  lower  than  any  in  the  lung  region.  A  line  midway  between  die  medial  borders  of  the 
two  lungs  is  located  using  a  projection- of  the  image  onto  die  horizontal  axis.  Ilus  line  is  used  when 
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computing  the  lung  borders  and  relative  distance  features  to  distinguish  die  tight  from  the  left  lung. 
Successive  horizontal  lines  that  lie  in  the  lung  parenchyma  of  each  lung  are  consideied  when 
computing  the  lung  borders,  Fig.  2.5.9.  The  endpoints  of  die  longest  horizontal  line  in  each  lung 
are  taken  as  the  lung  borders.  Two  lung  borders  arrays,  lateral  and  medial,  are  computed  by  the 
lung  border  detector  for  each  lung.  For  example,  to  determine  the  lateral  border  at  a  given  vertical 
height,  the  lateral  border  array  is  indexed  to  obtain  the  horizontal  coordinate  of  that  bordei.  i  lic¬ 
king  border  arrays  are  strictly  positive  for  all  vertical  coordinates  that  are  in  the  lung  parenchyma. 


Figure  2.5.9  -  Computation  or  lung  borders.  The  locauon  of  each  lung  in  the  unage  i,  lu»ipmCd  and  u,c 
line,  zMrd.  which  is  midway  between  them  is  derived.  Two  arrays  (one  for  the  laieral  border  and  the  oiher 
for  the  medial  border)  are  computed  for  each  lung.  The  index  of  eatli  aiiay  is  vertical  heigln  and  tin 
value  contained  in  the  array  is  the  horizontal  coordinate  of  that  lung  border;  the  value  is  I  if  the 
horizontal  height  is  not  within  the  lung.  The  endpoints  of  the  longest  line  segment  in  each  lung  ate  (alien 
as  the  respective  lung  borders,  when  computing  ihe  lung  borders  Two  segments  arc  illustrated  in  the  left 
lung,  above.  These  are  likely  to  arise  at  the  lung  border  because  of  anomalous  structures-  Here.  *j  /  is 
longer  than  bj  L  so  us  endpoints  are  determined  10  be  '»■<  huii/ontal  boundaries  uf  die  tighl  lung  at  y(1  by 
the  lung  border  locator. 
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Figure  2.S.10  -  Computed  lung  locations.  The  lateral  and  medial  borders  or  each  long  arc  tiiin|«iiul  h>  the 
lung  border  locator;  from  these,  the  heights  and  widths,  and  tops  and  bonoms  ol  each  lung  are  lictinnined 
by  the  border  locaror  as  well  as  the  medial  midpoints  of  each  lung 
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Figure  2.5.11  -  Relative  lung  distance  features  that  are  used  by  tbe  pattern  classifier.  The  telaim  Ijieial 
and  medial  distances  are  those  distances  from  the  respective  lung  borders  to  the  CN  that  arc  normalized  by 
tbe  width  of  the  lung  al  the  height  of  the  CN.  The  relative  central  distance  is  thai  distance  between  die 
nodule  and  the  medial  border  midpoint  normalized  b>  the  distance  between  the  medial  midpoint  and  ihe 
top/bottom  of  the  lung,  depending  on  whether  die  nodule  is  in  Use  tup  ur  boiium  half  of  the  lung, 
respectively. 


3  -  Experimental 


The  final  ANDS  consists  of  the  following  steps:  spline  filler,  smoothing  of  the  spline 
filtered  image,  detection  of  CN  centers,  smoothing,  and  search  for  the  must  prorninant  CN  centers. 
Four  processing  configurations  were  tested  on  5  films  to  determine  which  configuration  would 
provide  optimal  detection.  The  parameters  (knot,  image/aecurmilaior  rescaling,  image  resolution, 
and  radius)  were  tested  over  5  films  to  determine  which  provides  optimal  detection.  This  processing 
configuration  and  its  parameters  constitute  the  basic  ANDS. 

Artificial  Intelligence  in  the  form  of  a  Nodule  Expert  dial  uses  a  pattern  classifier  and  two 
procedurally  driven  nodule  experts  (which  detect  two  classes  of  false  posiuves)  aie  incorporated  in 
the  basic  ANDS  in  order  to  reduce  the  number  of  false  positives.  Forty-three  films  were  processed 
by  ANDS  with  and  without  A  I.  The  results  of  the  two  runs  are  compared. 

The  overall  goal  of  the  experimental  work  is  to  determine  a  nodule  detection  method 
that  best  detects  nodules,  to  tune  that  process,  and  then  to  reduce  the  number  of  false  positives  that 
are  reported  by  it.  The  experimental  work  has  four  parts:  optimization  of  the  linear  transfer  of 
optical  densities  to  pixel  values  during  photographic  copying  and  digitization;  choosing  a  nodule 
detection  process;  tuning  the  parameters  of  the  nodule  detection  process;  evaluation  of  the  ability  of 
AI  techniques  to  reduce  the  number  of  false  posiuves.  A  method  for  photographically  reducing  and 
then  digitizing  the  chest  radiograph  image  that  is  both  linear  and  repeatable  was  fust  devised.  Hie 
careful  definition  of  the  photographic  reducuon  and  digruzauon  methods  permits  additional  chest 
films  to  be  added  to  the  current  ANDS  database  without  the  introduction  of  batch-to  hutch 
variation.  The  linear  transfer  of  optical  densities  to  pixel  values  assures  that  nodulai  abnormaliUes  of 
various  densities  will  be  represented  without  degradation  in  the  digital  image. 


3.1  -  Optimal  Reproduction  of  ANDS  Database 


Fig.  3.1.1  illustrates  the  generalized  photographic/digitizauon  process.  Ihrce  (muster 
funcuons:  camera  flare,  film  characteristic,  and  digitizer  characterisuc  determine  how  optical 
densities  from  the  chest  radiograph  are  uansformed  into  pixel  values.  Ideally,  the  system  transfer 
function  should  represent  a  linear  mapping  between  optical  density  and  pixel  value.  I  lie  shape  of 
the  flare  curve  is  determined  by  die  amount  of  light  that  is  reflected  within  the  camcia  system;  the 
linearity  of  this  transfer  decreases  with  increasing  amounts  of  internally  reflected  light.  I  he  shape 
of  the  film  characteristic  is  determined  by  die  film  type  (emulsion)  and  its  development.  I  he 
placement  of  the  input  densiues  on  the  linear  poruon  of  die  film  characiensuc  is  deiei mined  by 
the  exposure.  The  shape  of  the  digiuzer  chaiactcnsue  is  determined  by  the  adjustment  of  the  gain 
of  the  A/D  converter  in  the  film  digiuzer. 


Digitizer  System 


Pixel  Value 


System  Transfer  Function 


Relative  LogE 


Film/Development  System  Camera/Lens  System 


Figure  3.1.1  *  The  optical  density  to  pixel  value  transfer  function  m  iiuadrant  I  is  tin  result  <>t  nuu. 
cascaded  systems:  cawera/lens,  film/development,  and  digitizer  talibrauon  The  dulled  lines  repie  .t  .i  the 
ideal  system  transfer  functions.  A  linear  system-transfer  function  is  dcsireable.  An  optimal  system  tune 
reproduction  characteristic  was  experimentally  determined.  The  digitizer  and  filni/dcvelpniem  cliarji  Rustics 
were  preset  at  optimal  levels  and  the  optimal  exposure  and  flare  condiuon  were  cxpenmentall>  determined 
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A  film,  Kodak  Commercial,  that  provides  a  useful  linear  range  of  approximately  3.0  iogl. 
and  a  gamma  near  1.0  was  chosen  for  photographic  reducuon.  The  development  process  was  fixed 
as  HC-110-D,  at  68  ±'/4°F  for  5’  using  R.I.T.  tray  rock  agitation.  The  digitizer  was  calibiulcd  with 
a  5-inch  Kodak  #2  step  wedge  to  provide  the  greatest  possible  linear  range  with  optimum 
discrimination  at  high  (~2.70)  densities. 

Given  that  the  Optronics  was  calibrated  to  produce  a  neai  linear  transfer  nl  optical 
densities  to  pixel  values  and  that  the  film  development  was  fixed  to  produce  a  gamma  near  1.0,  the 
conditions  that  were  varied  were  exposure  and  flare  condition.  Three  exposures  and  three  flare 
conditions  were  evaluated.  These  nine  exposure/flare  combinations  were  evaluated  using  three 
radiographs  whose  density  ranges  (in  the  lung  area)  typify  the  population  of  radiographs  that  was 
digitized. 


In-camera  sensitometry  was  used  to  determine  the  approximate  exposure  to  be  used  when 
copying  the  chest  films.  The  camera  system  was  set  up  identically  when  determining  this  exposure 
and  when  copying  the  50  radiographs,  'the  approximate  exposure  was  experimentally  determined 
by  photographing  a  10"  Kodak  #2  step  wedge  centered  on  the  light  table  with  the  luminous  area 
surrounding  the  step  wedge  masked  with  exposed  x-ray  film  (density  »  5).  An  exposure  that 
provided  a  near  1:1  mapping  of  step  wedge  densiUes  Ur  developed  film  densities  was  chosen.  As 
each  chest  radiograph  was  copied  die  10"  step  wedge  and  a  ui-bar  target  were  included  alongside 
the  radiograph  so  that  reproduction  might  be  quantitatively  assessed. 

A  discrepency  was  noted,  however,  between  the  reprodutuon  characteristics  den  ml  1mm 
the  masked  step  wedge  alone  and  those  derived  from  the  same  step  wedge  imaged  alongside  a  chest 
radiograph.  The  system  transfer  characteristics  dial  were  derived  from  film  samples  which  weie 
given  identical  development  and  digitized  consecutively  (no  adjustment  was  made  in  scanner 
calibration  between  runs)  are  plotted  in  Fig.  3.1.2.  The  discrepency  between  die  Jiaraclenstics  was 
attributed  to  camera/lens  flare  and/or  ambient  light  reflecting  from  the  suiface  of  the  radiograph 
when  copying.  A  tent  of  black  velvet  was  constructed  around  the  camera  system  to  eliminate 
ambient  light  (both  room  light  and  light  from  the  light  table  dial  was  reflected  from  the  ceiling). 


TK  •  120 


Construction  of  this  tent  resulted  in  oi.ly  a  slight  decrease  in  the  noted  cinticptiit).  thus 
camera/lens  flare  was  determined  to  be  the  primary  cause. 

The  source  of  the  camera/lens  flare  was  presumed  10  be  light  passing  through  the  nun 
lung  area  of  the  chest  radiograph,  which  is  nouceably  lighter  than  the  imaged  lung  parenchyma. 
Since  flare  was  shown  to  diminish  the  reproduction  of  the  higher  dcnsiues,  the  non  lung  region  was 
masked  with  exposed  x-ray  film.  Thus,  masking  would  diminish  die  amount  of  light  winch  would 
reflect  within  the  camera  system  and  lead  to  degradation  of  tone  rendiuon  of  higliei  densities 
Whether  masking  the  non-lung  region  produced  a  significant  reduction  in  camera  flare  is  the  subject 
of  the  following  statistical  analysis. 
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Figure  3.1.2  *  System  transfer  curves  of  the  step  wedge  for  the  same  step  »c<%c  e»|>usvd  at  time  flan 
conditions:  n  =  no  flare,  f  =  flare,  and  m  =  masked.  All  samples  were  developed  and  digm/cd  under 
identical  conditions.  The  differences  in  (lie  toe  portion  of  (he  curves  is  presumably  due  to  iama.i/lciK 

flare. 


Three  chest  radiographs  whose  density  ranges  (in  the  lung  parenchyma)  were  chosen  tw 
represent  the  population  of  radiographs  that  was  '.a  be  digiuzed.  Lath  of  these  three  radiographs 
was  copied  at  two  flare  conditions  (masked  and  not  masked)  arid  three  exposures  (lOscv.  at  f/l(>. 
f/22,  and  f/32).  The  third  flare  condition  that  was  evaluated  represented  the  ideal  east,  no  flan.  - 
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only  the  step  wedge  was  copied.  Images  that  were  produced  at  the  three  flare  conditions  are 
illustrated  in  Fig.  3.1.3.  Weighted  averages  (weights  correspond  to  die  relative  percent  of  the 
population  represented  by  each  film)  for  each  of  the  21  steps  (in  a  3x5  pixel  area)  were  derived 
from  digitized  images  of  the  three  films  at  each  fiare/exposure  condition.  A  second-order 
regression  was  computed  from  these  21  averaged  steps  for  each  cxposure/fiarc  combi  natron, 
regressing  on  pixel  value  as  a  function  of  transmission  density. 


Figure  3.1.3  -  Three  flare  conditions  (no  Tare,  lop;  masked,  led;  and  flare,  right)  cv.llll, licit  Id 
determine  which  (masked  or  not  masked)  would  result  in  the  most  linear  transfer  fiom  optiial  density  t.n 
the  chest  radiograph  to  pixel  value  in  its  digital  representauon.  Measurements  of  the  step  wedge  densities 
from  each  of  rite  above  images  are  plotted  in  Fig.  3.  L2.  The  objective  of  (ins  part  of  the  experiment  is  to 
determine  if  masking  die  non-lung  area  would  provide  better  density  to  pixel  value  transfer  because  uf  the 
presumed  reduction  ui  the  amount  of  camera/lens  flare. 
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The  goal  of  the  regression  was  to  determine  which  fife  condition,  masked  or  tin  masked, 
provided  the  most  linear  tone  reproduction.  Here,  linear  reproducuon  is  defined  as  a  lack  of 
statistical  significance  in  the  second-order  term  of  the  following  regression  equation: 

P  =  ,  D  +  b2U? 

where:  P  =  pixel  value, 

D  =  transmissive  .pt — c.r.  .... 

b„...b2  =  coefficients  of  the  regression  on  D. 

The  regression  analysis  was  performed  using  Minitab  (Ryan,  19?6|.  Table  3.1.1  siimman/es  (tic- 
regression  analysis.  TLs  regression  was  perfomied  over  optical  densities  in  the  useful  range  of  0.20 
to  2.72;  the  densities  in  the  hint?  '  '  r  "  films  were  within  this  range.  I  lie  only 

expo:  ..ic/oure  condition  that  gnditani  stxund-oider  regression  teini  is 

masked  at  f/22.  No  trend  v...  ,  .  u  .e  si  duals  from  the  regression  equation;  tins 

suggests  that  the  proposed  model  suliiciently  represents  the  data.  Masks  were  cut  for  each  of  the  50 
films;  each  was  copied  at  10  seconds  at  f/22;  given  identical  development:  and  digitized  on  the 
calibrated  scanner.  The  calibration  of  the  digitizer  was  periodically  checked  by  digitizing  the  #2 
step  wedge.  Calibration  was  maintained  throughout  digiuzauon,  no  recalibialion  was  lequucd. 


Student’s  t-statistic 
ITare  C.,:idi’Jon 


f-stop 

No  Flare 

Masked 

Flare 

16 

-1.61 

-3.47 

4.27 

22 

0.58 

1.28* 

6.83 

32 

0.36 

5.62 

9.34 

d.f. 

18 

13 

13 

Table  3.1.1  -  Student's  t-staiistics  on  the  second-order  term  of  repes-on  analyses  of  |iuel  caltic  as  a 
function  of  optical  density,  quadrant  1  if  F:u.  i  i  ;  fur  !'  rce  eiposurcs  and  three  flare  condition:.  As  a 
result  of  this  analysis  masked  and  f/22  were  .--.s  i,>ymg  the  50  films  in  the  AN1;S  database  This 
exposure  flare  condition  was  the  only  practical  conC.titm  in  which  the  second  w  Ja  term  of  ihe  regression  is 
statistically  insignificant. 


3.2  -  Choosing  an  Image  Processing  Configuration 


Four  image  processing  configurations  were  tested  with  their  parameters  set  at  fixed  levels 
to  determine  which  resulted  in  optimum  detection  of  the  pulmonary  nodules  present  in  5  films. 
The  tested  configurations  are: 

#  1  Spline  filter  wi  th  histogram  equalization 

Spline  smoothing 
Candidate  nodule  detection 
Spline  smoothing 
Vote  accumulation 

#2  Spline  filter  with  histogram  equalization 
Spline  smoothing 
Candidate  nodule  detection 
Sparse  convolution  smoothing 
Vote  accumulation 

#  3  Spline  filter  with  histogram  equalization 

Candidate  nodule  detection 
Spline  smoothing 
Vote  accumulation 

# 4  Spline  filter  with  histogram  equalization 
Candidate  nodule  detection 
Sparse  convolution  smoothing 
Vote  accumulation 

The  differences  between  these  methods  aie  spline  smoothing  following  spline  filleting  and  spline  m 
convolution  smoothing  following  candidate  nodule  detection.  AU  five  films  (6,  18.  32.  26.  41)  were 
processed  at  the  same  parameters  (resoluuon:  no  rescaling  between  the  filtered  image  and  the 
accumulator  array  of  candidate  nodule  centers;  radius  =  12  pixels;  knot  spacing  =  60,  and  no 
rescaling  of  the  original  image  (size  ~  1000x1000  pixels).  A  weighted  sum  of  three  measures:  true 
positive  percentage,  false  positive  fraction  (the  rauo  of  die  number  of  non  noules  between  die  first 
accumulator  point  and  the  last  detected  nodule  and  the  number  of  points  in  the  accumulator  list), 
and  the  DCHM  is  used  in  a  two-way  ANOVA  in  which  the  processing  configurations  are  treatments 
and  the  films  are  blocks.  Where  the  DCHM  is  similar  to  die  CIIM  but  is  coarser: 

DCHM  =  ^  (1  -  0.1q)h(q) 

q=0 

where: 

Q  =  number  of  quantiles.  Here,  there  are  10  quantiles,  each  containing  10  aicumiilatoi 
points. 

h(Q)  =  the  percentage  of  all  detected  nodules  per  quanule.  Q. 


A  summary  of  this  ANOVA  is  given  in  table  3.2.].  The  data  and  ANOVA  calculations  aie 
presented  in  Appendix  9.2. 


Source 

Sum  of 

Squares 

* 

Mean 

Square 

F'laliti 

Treatments 

0.0415 

3 

1.3*3«K>'* 

2.01 

Blocks 

0.9690 

4 

0.2423 

35.19* 

Residuals 

0.0826 

32 

6.885x10  j 

Total 

1.0931 

39 

Table  3.2.1  -  ANOVA  result*  for  processing  configurations  as  «cjuuem»  jid  wd; -.tiup)is  js  Mock'.  lijtj 
and  calculation  of  ANOVA  are  given  in  Appendix  9.2.  1  be  eflm»  of  film  arc  signifnani  at  alplu-llOS 
(K.OOl).  Hie  variation  due  to  blocks,  which  is  a  result  of  the  non- hmnotcuciiy  of  the  films,  ntjy  have 
obscured  ar.y  differences  due  to  treatments. 

The  conutbuuon  of  the  treatments  (processing  configurations)  to  the  total  variance  is  not 
statistically  significant  at  the  a =0.05  level,  while  the  contribution  of  the  blocks  (films)  is  slausuuiily 
significant  at  a  =0.05  (P~0.001).  The  amount  of  variation  among  blocks  obscures  any  differences 
due  to  treatments  because  of  the  non-homogeneity  of  the  films.  Here,  non- hvnu>gctu-ttv  implies: 
structures  represented  in  the  films,  their  relauve  sizes,  shapes,  and  intensities,  which  vary  among 
films.  One  may  not  draw  a  conclusion  founded  on  these  st-usCca!  analysis  that  the  differences 
among  the  processing  configurations  are  signtficanr. 

Processing  configuration  ft  4  was  chosen  for  incorporation  in  ANUS,  i  table  ol  data 
from  which  the  ANOVA  was  computed  (see  Appendix  9.2)  indicates  dial  methods  u  I  and  U  3  |,;|VL' 
the  lowest  levels  of  performance  (with  means  of  0.71  and  0.68,  respecuvely)  ovei  all  5  films. 
Configurations  #7  and  #4  have  the  highest  levels  of  performance  (me  ins  (if  0.79  and  0.7fi. 
respecuvely).  The  choice  is  between  methods  #2  and  ti4.  Although  method  #4  does  run  have 
the  highest  performance,  it  was  chosen  as  the  basts  of  ANDS  because  it  was  close  to  til  (vwtiuii 
4%)  and  because  it  requires  one  less  step  (spline  smoothing  following  spline  filterm.  .  Also 
configuration  #4  is  faster,  requiring  one  less  step  than  configuration  til.  Tr  final  processing 
configuration  of  ANDS  based  on  this  analysis  is: 

if  4  Spline  filter  with  histogram  equahzauon 
Candidate  nodule  detecuon 
Sparse  convolution  smoothing 
Vote  accumulation. 


3.3  -  Tuning  the  Parameters  of  ANDS 


The  optimal  processing  configuration,  # 4 ,  was  evaluated  at  4  radius  values  (8,  10,  12,  ami 
20;  which  correspond  to  nodules  from  .75  to  2.0  centimeters  in  diameter),  4  knot  spacing  values  (20, 

40.  60,  and  120),  and  3  resolutions  (rescaling  by  a  factor  of  2  between  spline  filtered  image  and  CN 
centers  image;  original  image  rescaled  by  a  factor  of  2;  and  no  rescaling)  on  six  films  (6.  18,  32,  36, 

41,  and  44)  to  determine  which  configuration  of  parameters  produces  optima)  detection  of  the 
nodules  present  in  the  films.  The  three  resoluuon  conditions  are  illustrated  in  Pig.  3.3.1.  The  two 
resolutions  which  involve  rescaling  the  image  were  evaluated  because  1  believed  that  ANDS  would 
take  less  time  to  compute  with  these  smaller  images  and  that  the  results  might  be  acceptable.  The 
distance  metric  was  used  to  evaluate  the  performance  of  these  three  parameters.  Die  optimal  image 
resolution  was  first  chosen  and  then  an  ANOVA  was  performed  on  the  remaining  parameters  at  this 
resolution  to  determine  which  parameter  contributes  a  staUstically  significant  amount  of  variation, 
given  the  following  model: 


v- 


+  p,  + 


RJ  + 


Kk  +  FjxRj  +  F'jxK^  + 


lijk 


where: 

'V’ijk  =  observed  mean 

t}  =  effects  due  to  overall  mean 

Fj  =  effects  due  to  films 

=  effects  due  to  knot  spacing  value 
Rj  =  effects  due  to  radius  value 

FjxRj  =  effects  due  to  interacuons  between  film  and  radius 
FjxK^  =  effects  due  to  interactions  between  film  and  knot 
=  residual  effects. 


If  a  parameter  makes  a  statistically  significant  contnbuuon  to  the  total  variance  then  one 
might  infer  that  the  value  of  that  parameter  has  an  effect  on  detection.  Furthermore,  one  of  the 
values  of  the  significant  parameter  might  result  in  a  more  opumal  detection  than  the  other  values. 


The  values  of  the  parameters  of  ANDS  are  set  at  the  optimal  values  of  the  statisucally  significant 
parameters,  which  were  determined  by  this  A  NOVA. 


Resolution  1:  No  rescaling. 

IglxrM.  y:N] 
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Resolution  2:  Accumulator  rescaling.  Computed  image  center  coordinates  are  rescaled  before  entry 
into  rescaled  accumulator  image. 

Sg(x:M,y:N)  c(Sg[x:M,  y:N],  rad,  res)  Cg[x:M/r«,  y:N/rc%| 


Filtered  unage 
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Resolution  3:  Original  image  is  rescaled  by  a  factor  of  2. 

Ig(x:M, y:N]  z{lg)x:.\.. ^  N], factor)  rg|x:M/factor, y:N/fauw| 

Input  image 


3.3  -  Tuning  the  Parameters  of  ANDS 


Due  to  limitations  imposed  on  the  amount  of  available  computing  ume  and  die  desne  to 
finish  this  work  in  a  reasonable  amount  of  lime,  the  scope  of  the  parameter  testing  had  to  be 
limited  to  6  films,  4  radius  values,  4  knot  spacing  values,  and  3  resoluuons  (a  total  of  288  mux). 
Thus,  the  optimal  parameter  values  that  are  reported  here  are  coarse  global  esumates.  liach  run 
averaged  about  1.25  hours  of  real  time  (15‘  for  file  transfer  from  RIG,  checking  file  for  errors, 
reorienting  image,  and  rescaling  image;  20’  for  spline  filtering;  15’  for  detection  of  centers  of  CNs; 
20’  for  sparse  convoluuon  smoothing;  and  S’  for  compilauon  of  the  list  of  accumulator  values.  The 
entire  parameter  test  required  about  360  hours  or  15  days  of  real  time  to  compute.  The  actual 
runtime  was  significantly  longer  because  of  failures  or  rebooungs  of  the  machines  in  the  distributed 
network. 


The  resolution  that  is  used  in  ANDS  was  chosen  by  inspecuon  of  the  95%  confidence 
intervals  of  the  DM  means  computed  over  all  films,  radius  values,  and  knot  spacing  values.  I  hese 
confidence  intervals  are  illustrated  in  Fig.  3.3.2.  Here,  the  resoluuon  with  the  lowest  average  DM. 
original  image  rescaled  by  a  factor  of  2,  is  siausucally  disungurshablc  from  die  other  two 
resolutions. 


Mean  Distance  Metric 


Figure  3.3.2  -  95%  confidence  intervals  on  the  mean  Distance  Meuics  computed  over  all  lilun.  lairin',  talim. 
and  knoi  spacings  for  three  image  resolutions:  rest,  no  rescaling  of  image:  ioom2.  rescaling  of  input  image 
by  a  factor  of  2;  and,  rest,  rescaling  between  spline  filtered  image  and  CN  center  image  by  a  fatitu  of  ?. 


An  ANOVA  was  computed  over  all  films,  knot  spacing  values,  and  radius  values  for  data 
derived  at  the  chosen  resolution.  The  results  of  this  ANOVA  are  presented  in  1  able  3.3.1.  1.1  leas 
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due  to  radius  and  the  radius/film  interaction  are  statistically  significant  at  the  o  =  0.05  level.  I  . Heels 
due  to  films,  knot  spacing  values,  and  the  film/knol  lnieiacnon  are  nut  stausucally  significant  at  the 
a =0.05  level.  The  effects  due  to  knot  spacing  would  be  significant  at  an  a-level  slightly  greater 
than  0.05. 


Source 

Sum  of 

Squares 

V 

Mean 

Square 

F-raiio 

Film 

3.6120 

5 

0.0380 

0.96 

Radius 

3.4504 

3 

1.1501 

29.01* 

Knot 

0.8602 

3 

0.2867 

7.23 

FilnuRadius 

2.1724 

15 

0.1448 

365* 

FilnuKnot 

1.2522 

15 

0.0835 

2.11 

Error 

2.1408 

54 

0.0396 

TOTAL 

13.4881 

95 

0.1420 

Table  3.3.1  -  Results  of  ANOVa  ihsi  was  performed  over  all  films,  which  were  res.  Jed  h>  a  Uuur  .  at 
all  knots  and  radii.  Effects  due  to  radius  and  fum/radius  inteiauion  aie  stausucally  Mgiiiiiuni  ai 
alpha =0.05  (P  <  0.0J). 

Since  these  analysis  failed  to  show  a  statistically  significant  coninbuuon  due  to  km.i 
spacing  and  the  film/knot  interacuon,  a  single  knot  spacing  parameter  value  is  incorporated  in 
ANDS.  This  is  the  value  that  results  in  the  lowest  DM  over  all  films  and  radii.  The  confidence 
intervals  from  which  this  choice  was  made  ore  given  in  Fig.  3.3.3.  This  decision  is  ad  hoc  because 
there  is  no  statistically  significant  difference  evident  in  the  confidence  intervals.  A  knoi  spacing 
value  of  60  was  chosen  as  the  parameter  of  ANDS  because  this  value  has  the  lowest  mean,  see  1  rg. 
3.3.3. 


The  significance  of  radius  .u.-it  ,:;.d  film/radius  interacuon  suggests  that  no  single  radius 
value  would  suffice  in  producing  optima:  detection  over  all  films.  Thus,  two  radius  values.  10  and 
20,  were  chosen  as  the  parameters  of  ANDS.  Note:  because  the  chosen  resoluuon  is  rescaling  by  a 
factor  of  2,  the  knot  spacing  and  radius  values  are  reduced  by  a  factor  of  2;  so,  a  knot  spacing  value 
and  radii  of  5  and  10  were  incorporated  into  ANDS.  Fig.  3.3.4  illustrates  the  95%  confidence 
intervals  on  the  DM  means  these  were  computed  over  all  radius  values  and  knot  spacings.  A  radius 
value  of  10  pixels  (~0.5  cm.)  was  chosen  for  the  final  ANDS  because  it  has  the  lowest  mean  DM. 
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3.3  -  Tuning  the  Parameters  of  ANDS 


although  no  statistically  significant  difference  is  apparent  Although  the  radius  value  of  20  pixels 
(sl.O  cm.)  shows  a  poorer  performance  (higher  DM  mean)  it  was  also  chosen  for  incorporation  into 
ANDS  because  it  corresponds  to  larger  nodules. 


4  -  RESULTS  -  Evaluation  of  the  Performance  of  A1  in  Reducing  the  f  alse  Positive 
Rate 


Given  an  automated  nodule  detection  system  that  has  been  designed  to  piovnle  optimal 
detection  (over  the  films  and  system  parameters  that  were  tested)  of  any  pulmonary  nodules  that  aie 
present  in  a  chest  radiograph,  the  final  phase  of  this  work  is  to  reduce  the  number  of  false  positives 
that  are  reported  by  that  system.  Concomitant  with  this  goal  is  the  mandate  of  not  greatly  reducing 
the  number  of  true  positives. 

Seven  films  were  omitted  from  the  final  test  because  of  errors  in  digiU/aUon  (see  fig.  4.1) 
that  prevented  computation  of  the  lung  borders  which  are  required  by  the  pattern  rccogm/cr  for 
determination  of  the  relative  location  measures.  The  films  dial  were  omitted  are  numbers  1,  22.  2X, 
31,  and  33.  Film  38  was  omitted  because  the  nodule  was  always  detected.  The  nodule  is  in  the 
lower  medial  corner  of  the  nght  lung,  see  Fig.  4.2.  The  lung  border  in  the  corner  of  this  image  act 
as  edges  on  the  border  of  the  nodule.  Film  41  was  omitted  because  us  image  file  was  accidentally 
smashed  following  parameter  tuning;  time  did  not  permit  redigiu/auon.  Appendix  9.3  provides  Hie 
sialisUcs  about  each  film  in  the  ANDS  database  and  summarizes  the  results  of  these  tests. 

Forty-three  films  were  processed  at  two  radu  (5  and  10.  which  correspond  to  .3  and  1.0 
cm,  respectively)  by  ANDS  with  AI  under  two  conditions  and  without  Al.  In  die  two  conditions 
with  AI,  the  pattern  classifier  was  trained  with  different  numbers  of  films  and  tested  on  the  entire 
database.  In  the  first  case,  it  was  trained  with  9  films  and  in  the  second  with  37  films  (films  Unit 
contained  nodules  whose  nodule  statistics  could  be  computed). 

The  nodule  appearance  statistics  from  which  the  features  that  aie  used  to  train  die  pattern 
classifier  are  derived  from  a  smoothed,  histogram  equalized,  windowed  region  around  die  CN  from 
the  spline  filtered  image.  The  top  50  points  in  the  accumulator  lists  obtained  when  testing  ANDS 
on  the  database  at  two  radii  were  classified  (some  could  not  be  classified  because  a  64xM  pixel 
window  around  the  CN  could  not  be  made  or  because  the  appearance  statistics  could  not  be 
computed).  These  classifications  were  used  to  tram  the  pattern  recognizer.  The  feature  vectois 
were  input  to  BMDP7M,  which  runs  on  the  DliC-10. 


Figure  4.1  -  Image  truth  incorqpiete  scankne.  presumably  attributable  to  digiura.  1  lit  lung  Luidcts  muld 
not  be  computed  -n  images  vcth  incomplete  scanknes  These  images  have  beer  .  ..tied  front  the  films 
evaluated  by  ANDS  wills  Ai  because  witliout  the  relative  distance  features  could  nui  be  tout, n, ted. 


Figure  4.2  -  Lung  #38  was  omitted  from  the  evaluation  of  ANDS  because  its  nodule  h  jU,.>.  . u 

regardless  of  parameters  and  processing  configurations  The  nodule  is  in  die  lower  medial  corner  of  tin- 
right  lung.  The  king  borders  coincide  with  the  margin  of  the  nodule  and  vote  fur  the  nodule 


4  -  RESULTS  -  Evaluation  of  the  Performance  of  AI  in  Reducing  the  false  Positive 

Rate 


This  statistics  package  computed  the  weights  and  constants  that  aie  used  liy  die  disuimmaiil 
function.  These  values  were  instantiated  in  the  pattern  classifier.  The  input  to  the  Nodule  lixpcit 
is  the  feature  vector  of  a  CN.  The  output  of  the  Nodule  lixpert  is  a  decision  •  whether  01  not  a 
nodule  was  detected.  If  a  CN  is  recognized  as  a  false  posiuve.  it  is  omitted  from  die  list  of  CNs 
that  is  reported  by  ANDS.  Only  CNs  that  are  classified  as  a  nipple  or  any  kind  of  nodule  me  kept 
in  the  screened  list  of  CNs.  The  performance  of  the  Nodule  Txperr  is  evaluated  by  subjecting  this 
list  of  CNs  to  the  performance  evaluauon  procedures  that  are  described  in  Chapter  2.4  1  ig.  4.5 
illustrates  the  result  of  applying  37-trained  ANDS  on  lung  #9  at  a  radius  of  5  pixels 
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:  >>>>  DISTANCE  Metric  for  /u/b lll/thes ts/me t r Ic/preAl /  19 1  rbk3Ur2p4 . 

;  >>>>>  Cum  Histo  Metric  <«<< 

;  Number  of  false  positives  -  0  of  points  Mhic'-  are  not 
nodules,  that  lie  between  the  first  accumulator  point  and 
the  last  detected  nodule. 

Percentage  of  the  2  known  nodules  which  were  detected. 

Number  of  points  In  accumulator. 

Percentage  of  positives  in  i-tb  group  of  10  accumulator  points. 

Percentage  of  positives  in  2-th  group  of  10  accumulator  points. 

Percentage  of  positives  in  3-th  group  of  10  accumulator  points. 

Percentage  of  positives  in  4-th  group  of  10  accumulator  points. 

Percentage  of  positives  is  5-th  group  of  10  accumulator  points. 


<<<< 


0.  Accf 141 , 

142] 

■ 

9792 

1.  Accf 132 , 

200] 

a 

9504 

2.  Accf 2 77 , 

278] 

a 

9120 

3.  Accf 120, 

222] 

s 

8896 

•  4.  Accf  353 , 

117] 

a 

8544 

5.  Accf 1 45 , 

292} 

a 

5064 

6.  Accf 273 , 

200] 

a 

7840 

7.  Accf 283 , 

250] 

a 

7648 

8.  Acc[305, 

236] 

a 

7648 

9.  Ac c f 1 48 , 

190] 

a 

7648 

10.  Accf 291 , 

285] 

a 

7616 

11.  Accf 1 05 , 

120] 

a 

7328 

12.  Accf 258 , 

283] 

a 

7264 

13.  Accf 155 , 

288] 

a 

7264 

14.  Accf 132 , 

215] 

a 

7232 

15.  Ac c f 1 64 , 

163] 

a 

7008 

18.  Accf369, 

221] 

a 

6944 

17.  Accf 278 . 

215] 

a 

6880 

18.  Accf  368 , 

209] 

a 

6846 

19.  Acc f 1 29 , 

164] 

a 

6848 

20.  Ac c [ 36 7 . 

126] 

a 

6816 

21.  Accf 130, 

111] 

a 

6816 

22.  Accf 1 47 , 

165] 

a 

6784 

23.  Accf 1 53  , 

127] 

a 

6688 

24.  Ac  c  f  299 , 

137] 

a 

6624 

25.  Accf 188 , 

332] 

a 

6592 

•  26.  Accf  69. 

113] 

a 

6592 

27.  Ac  c  f  264 , 

320] 

a 

6560 

28.  Ac  c  f 1 24 , 

241] 

a 

6560 

29.  Accf 163 , 

224] 

a 

6528 

36. 
37  . 
35. 

39. 

40. 

41. 

42. 

43. 

44. 
45  . 
46. 
47  . 
46. 
49. 


Accf 163,  3S2] 

a 

6400 

Accf 253,  245] 

« 

6400 

Accf 173,  246] 

a 

6368 

Accf  58.  176] 

a 

6304 

Accf 321 ,  254] 

a 

6240 

Accf 297,  261] 

a 

6240 

Accf  56.  217] 

a 

6208 

Accf 304 ,  177] 

a 

6208 

Accf 256,  216] 

a 

6176 

Accf  301 .  202] 

a 

6176 

Accf  317 ,  187] 

a 

6176 

Accf  258 ,  261] 

a 

6144 

Accf 367 ,  189] 

a 

6144 

Accf  292 ,  151] 

a 

6144 

Accf 186,  306] 

a 

6080 

Accf 1 75 ,  265] 

a 

6080 

Accf  99,  167] 

a 

6080 

Accf 371,  81] 

a 

6016 

Accf  56,  157] 

a 

5952 

Acc[106,  355] 

a 

5920 

;  >>>>  OISTANCE  Metric  for  /u/bi 1 l/thes ts/met rlc/pos tAI 2/ 1 91 rbkJUr 2pa .  «<<v 

J  >>>>>  Cum  Histo  Metric  <«<< 

Number  of  false  positives  -  0  of  points  which  are  not 
nodules,  that  lie  brt-een  the  first  accumulator  point  and 
the  last  detected 

Percentage  of  tht  2  known  nodules  which  were  detected. 

;  Number  of  points  In  accumulator. 

Accf 353,  117]  *  5544 
Accf 105,  120]  «  7328 
Accf  69,  113]  .  6592 
Accf  124,  241]  «  6560 
4.  Acc[32 1 ,  254]  •  6240 
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Figure  4.5  ■  The  result,  bottom,  of  applying  Al  iccltnniucs  <J7-uamed)  lu  rcdmc  the 
posiiives  in  the  last  of  accumulator  points  produced  by  the  uruinellrgem  ANUS  system,  i,,,, 
al  the  top  of  each  list  is  a  summary  of  the  performance  of  ANUS  Data  such  as  the  ah. 
10  assess  the  effects  of  the  Al  tccluiniucs  on  ANUS 
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4  -  RESULTS  -  Evaluation  of  the  Performance  of  AI  in  Reducing  the  1'alse  Positive 

Rate 


The  performance  of  the  pattern  classifier  may  be  visualized  in  u  classification  mnliix. 
where  the  rows  represent  classes  that  were  taught  and  the  columns  represent  the  classification  results 
of  the  pattern  classifier.  Some  of  the  CNs  that  do  not  neatly  fit  into  any  of  the  eleven  classes  were 
classified  as  Undetermined  (UN)  when  Uauung  the  pattern  classifier.  CNs  that  are  classified  as 
Undetermined  were  not  used  when  determining  the  discriminant  function.  The  classification  maiiix 
obtained  from  running  ANDS  on  only  the  training  films  when  9-trained  is  presented  in  Table  4.1. 
The  classification  matrix  obtained  when  running  37-uained  ANUS  on  the  training  films  is  presented 
in  Table  4.2.  Taoles  4.3  and  4.4  present  the  classification  matrices  obtained  when  lesung  the  9-  and 
37-trained  AND  systems  on  the  entire  database. 


Classified  As: 


Known 

class 

pet 

ri 

SR 

sv 

Iv 

SN 

MN 

LN 

lb 

mb 

SB 

NI 

Ull 

Count 

RIB 

0.53 

10 

4 

1 

2 

0 

1 

1 

0 

0 

0 

0 

0 

19 

SR 

0.43 

1 

3 

0 

0 

1 

1 

0 

0 

0 

0 

) 

0 

7 

SV 

0.73 

5 

0 

62 

11 

5 

1 

1 

0 

0 

0 

0 

0 

85 

LV 

0.67 

1 

0 

7 

26 

0 

0 

4 

0  ' 

1 

0 

0 

0 

39 

SN 

0.50 

0 

2 

2 

0 

5 

1 

0 

0 

0 

0 

0 

0 

10 

MN 

0.56 

1 

2 

0 

0 

0 

5 

0 

0 

0 

0 

1 

0 

9 

LN 

0.80 

1 

0 

0 

0 

0 

0 

4 

0 

0 

0 

0 

0 

5 

LB 

LOO 

0 

0 

0 

0 

0 

0 

0 

11 

0 

0 

0 

0 

11 

MB 

0.88 

0 

1 

2 

5 

0 

0 

1 

0 

63 

0 

0 

0 

72 

SB 

0.33 

0 

0 

0 

3 

0 

1 

0 

0 

0 

2 

0 

0 

6 

NI 

0.50 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

2 

0 

4 

UND 

0.00 

45 

7 

25 

12 

15 

3 

11 

3 

2 

1 

11 

0 

135 

Table  4.1 

-  Classification 

matrix 

for 

training 

films  for  9-trained 

ANI)S(  Hi,  5. 

ft.  X. 

17. 

Ift.  IX 

when  tested  on  training  films.  The  abbreviations  for  the  classifications  are  defined  in  2.5.0.  40.'  CNs  were 
evaluated. 


The  performances  of  the  Rib  and  Vascularity  Experts  are  presented  in  Table  4.5.  lieu, 
the  row  represents  all  CNs  that  were  detected  by  these  experts  and  the  columns  are  the  class  to 
which  the  CNs  belong  (specified  when  training  the  pattern  classifier).  These  results  were  obtained 
when  9-  and  37-trained  ANDS  were  run  on  the  entire  database.  The  performance  of  the  Nodule 
Expert  at  both  trainings  on  both  the  training  films  and  the  enure  database  are  presented  in  Table 
4,6.  Since  several  films  contain  more  than  one  nodule,  it  is  more  meaningful  (from  the  patients’ 


viewpoint)  to  talk  about  nodule  detection  in  terms  of  films  which  contain  nodules  that  aie  missed 
(that  is,  the  nodules  that  are  present  are  not  recognized  by  the  Nodule  lixpcrt)  rather  tlian  absolute 
percentages  of  nodules  that  are  missed  in  a  given  film.  Fig.  4.6  compares  die  95%  confidence 
intervals  on  the  average  percent  correct  classification  for  9-  and  37- trained  ANDS  when  tested  on 
training  films  and  on  the  entire  database. 


Classified  As: 


Known 

class 

pet 

ri 

SR 

*Y 

Is 

SN 

MN 

LN 

lb 

mb 

SB 

Nl 

un 

Count 

RIB 

034 

90 

10 

12 

7 

3 

13 

8 

0 

; 

4 

20 

0 

1 08 

SR 

0.54 

1 

7 

1 

0 

2 

2 

0 

0 

0 

0 

0 

0 

13 

SV 

0.65 

14 

5 

275 

72 

U 

18 

1 

0 

9 

15 

1 

0 

421 

LV 

0.73 

1 

0 

25 

116 

0 

5 

4 

0 

13 

2 

0 

b 

186 

SN 

0.53 

1 

2 

5 

0 

16 

0 

u 

0 

0 

1 

5 

b 

30 

MN 

0.27 

3 

4 

3 

8 

”\ 

14 

4 

0 

3 

3 

7 

0 

51 

LN 

0.75 

0 

0 

0 

2 

0 

1 

9 

0 

b 

0 

6 

0 

12 

LB 

0.96 

1 

0 

0 

0 

0 

0 

0 

193 

1 

0 

5 

0 

2uU 

MB 

0.82 

2 

0 

18 

31 

2 

1 

0 

0 

429 

36 

1 

D 

520 

SB 

0.58 

0 

0 

1 

1 

1 

0 

0 

0 

2 

7 

U 

(i 

12 

N1 

0.64 

0 

0 

2 

0 

2 

0 

C 

0 

0 

0 

7 

0 

11 

UNO 

0.00 

158 

39 

183 

30 

76 

95 

22 

0 

7 

29 

66 

0 

705 

Table  4.2 

-  Classification  matrix  for 

uaini-.jj 

films  for  37-uarned  ANDS  (trained  on 

all 

filiiiv  dial 

any  nodu le(s)) 

what 

tested 

on  turning 

films. 

2329 

CNs 

west 

evaluated. 

Known 

pci 

ri 

SR 

ss 

Classified 
1*  SN 

As: 

MN 

LN 

lb 

mb 

SB 

Nl 

un 

(.(•uni 

class' 

RIB 

0.50 

106 

34 

18 

9 

14 

1 

8 

1 

1 

0 

21 

0 

213 

SR 

0.29 

3 

4 

0 

1 

2 

1 

0 

0 

0 

0 

3 

0 

14 

SV 

0.68 

30 

0 

332 

98 

9 

2 

6 

0 

13 

0 

1 

0 

491 

LV 

0.67 

5 

0 

17 

151 

0 

0 

16 

0 

17 

0 

0 

0 

226 

SN 

0.57 

0 

3 

6 

0 

17 

3 

0 

0 

0 

0 

i 

0 

30 

MN 

0.18 

12 

7 

5 

8 

1 

9 

1 

0 

2 

1 

5 

0 

51 

LN 

0.33 

4 

1 

0 

1 

0 

2 

4 

0 

0 

0 

0 

0 

12 

LB 

0.97 

1 

2 

0 

0 

1 

0 

0 

241 

0 

1 

2 

0 

248 

MB 

0.79 

0 

1 

42 

51 

1 

0 

6 

7 

475 

18 

0 

0 

601 

SB 

0.25 

0 

0 

3 

2 

0 

1 

0 

0 

2 

3 

0 

0 

12 

NT 

0-23 

2 

5 

1 

0 

2 

0 

0 

0 

0 

0 

3 

0 

13 

UND 

0.00 

246 

62 

187 

60 

133 

10 

30 

7 

12 

4 

88 

0 

839 

Table  4.3  -  Classification  matrix  for  9-uained  ANDS  (5,  6,  8,  12,  16,  18,  32,  36,  44)  wlicn  levied  on  cubic 
ANDS  database.  27S0  CNs  were  evaluated. 
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Classified  As: 


nown 

pet 

ri 

SR 

sv 

Iv 

SN 

MN 

LN 

Ib 

tub 

SB 

Nl 

un 

Count 

class 

IB 

0.54 

116 

11 

20 

7 

5 

17 

8 

0 

2 

4 

23 

0 

213 

R 

0.50 

1 

7 

1 

0 

3 

2 

0 

0 

0 

0 

0 

0 

14 

V 

0.64 

14 

5 

316 

96 

a 

19 

1 

0 

11 

17 

1 

0 

491 

V 

0.74 

1 

0 

30 

168 

0 

6 

4 

0 

16 

3 

0 

0 

227 

N 

0.53 

1 

2 

5 

0 

16 

0 

0 

0 

0 

1 

5 

0 

30 

IN 

0.27 

3 

4 

3 

8 

2 

14 

4 

0 

3 

3 

7 

0 

51 

N 

0.82 

0 

0 

0 

1 

0 

1 

9 

0 

0 

0 

0 

0 

11 

B 

0.96 

2 

0 

0 

0 

0 

0 

0 

239 

1 

0 

6 

0 

248 

IB 

0.82 

3 

0 

22 

34 

4 

1 

0 

0 

493 

42 

2 

0 

601 

B 

0.58 

0 

0 

1 

1 

1 

0 

0 

0 

2 

70 

0 

0 

12 

I 

0.54 

0 

1 

2 

0 

2 

0 

1 

0 

0 

0 

7 

0 

13 

ND 

0.00 

187 

48 

225 

34 

92 

104 

23 

0 

10 

35 

81 

0 

839 

Table  4.4  -  Classification  matrix  for  37-trained  ANDS  (trained  on  ail  films  dial  cuiuuin  any  nuduMsIl  "tic11 
tested  on  entire  ANDS  database.  2750  CNs  were  evaluated. 


Performance  of  Rib  F.xpert  on  Entire  Database 
Taught  As: 

ib  pet  ri  SR  iv  Iv  SN  MN  LN  lb  mb  Sit  Nl  un  Cnuiil 

Xpert  cor  It 

IB  0.15  47  1  27  15  4  3  3  33  87  3  I  87  311 

213  ribs  were  taught 

311  CNs  were  classified  as  rib  by  the  Rib  Expert 


Performance  of  Vascularity  Expert  on  Entire  Database 
Taught  As: 


asc 

xpert 

pet 

cor1! 

ri  SR  sv 

Iv 

SN 

MN 

LN 

Ib 

mb 

SB 

Nl 

Ull 

Count 

ASC 

0.46 

16  0  73 

32 

0 

0 

1 

0 

71 

0 

0 

35 

228 

717  vascular  suuctures  were  taught 

228  CNs  were  classified  as  vascularity  by  the  Vascularity  Expat 


Performance  of  Vision  Experts  when  Tested  on  Enure  Database 

Expert  Sensitivity  True  Positive  Kate 

Vascularity  0.15  0.47 

Rib  0.22  0.15 


Table  4.5  -  Performance  of  vision  expens  when  tested  on  entire  database.  The  niatitm  iltiistiatc 
classifications  to  which  the  CNs  detected  by  die  experts  belong.  Sensitivity  is  the  fraction  of  CNs  that 
belong  to  the  class  that  was  detected  by  the  expat  that  in  fact  belong  to  the  correct  class. 


tr  •  no 


Performance  of  Nodule  Expert  on  Entire  aase 


Training 

9-trained 

37-trained 


Sensitivity 

0.58 

0.76 


True  Positive  Hate 

0.14 

0.15 


TaUe  4.6  -  Performance  of  Nodule  Expen  when  tested  on  entire  database  at  both  tfainingv  I  Ins  is  the 
performance  of  the  pattern  classifier  and  the  classification  rule,  t  hese  results  represent  die  duet  non 
performance  of  ANI>S  on  all  nodules  in  the  films  that  comprise  die  database.  Thai  is  these  values  relied 
the  ability  of  ANDS  to  detect  all  the  nodules  in  the  database 

Tested  on 
All  Films 


Tested  on 
Training  Filins 


0  .20  .40  .60  .80  1.0 

Aver: .  e  Percent  Conect  Classification 

Figure  4.6  -  The  95%  confidence  interval  L.  the  average  percent  correct  Uassifkatiuii  of  ANUS  when  V 
and  37-trained  both  when  tested  on  naming  films  and  un  die  entire  database 


Table  4.7  compares  the  changes  in  !  >M.  false  posiuvc  and  true  posuw  between  built 
trainings  (9  and  37)  and  the  naive  (no  Al)  ANUS.  The  true  posiuve  uric  reported  is  die  average  of 
the  true  posiuve  rates  for  each  film,  where  the  true  posiuve  rate  is  the  fracuon  of  known  nodules 
that  is  detected.  False  posiuve  represents  the  number  of  non  nodules  dial  lie  between  die  first 
accumulator  point  and  the  last  detected  nodule.  The  value  reported  is  the  avetage  over  all  liltns. 
The  average  DM  over  all  films  is  also  reported.  Only  processing  configurations  (films  and  radius) 
at  which  any  nodule  was  detected  in  both  the  pre-training  and  the  trained  systems  are  included  in 
these  analyses.  Student’s  t-statisuc  is  computed  for  the  three  metrics  as  are  P- values. 
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Rate 


Comparasion  of  the  Performance  of  9T rained  and  naive  ANDS 


ANDS 
9-t  rained 
naive 

Difference 

t-statistic 

P-Value 


METRIC 

DM  ft  of  False  Positive  True  Positive  Rate 

0.353  0.588  0.755 

0,550 _ 12.235 _ 0.827 

its 47  "wn 

-3.96*  -5.20*  -3.00* 

P  <  0.0005  P  «  0.0005  0.0UU5  <  P  <  0.005 


Comparasion  of  the  Performance  of  37-Trained  and  naive  ANDS 
METRIC 


ANDS 

DM 

ft  of  False  Positive 

True  Positive  Rale 

37-trained 

0.462 

1.809 

0.7X7 

naive 

0.594 

11.915 

Difference 

aii«aaasiMt>»r>T;«e5SBs 

(-statistic 

OO 

-o 

« 

-5.67* 

-2.98* 

P-Value 

.05  <  P  <  .01 

P  «  0.0005 

U  0005  <  P  <  ll  005 

Table  4.7  -  Comparasions  of  the  performance  of  9-  and  37-traincd  ANDS  with  the  naive  ANDS  I  hoc 
comparasions  are  made  over  ANDS  configurations,  (film  and  radios)  which  were  delected  in  hoUt  naive  and 
(rained  systems.  That  is,  the  differently  trained  ANDS  that  are  evaluated  here  were  tested  on  all  J7  films  in 
the  database.  However,  il  only  makes  sense  to  compare  films  dial  contain  nodules  that  were  JtUrlul  by  the 
naive  system  if  one  is  lo  assess  impfovemenl/changes  in  the  detection  mctiits.  Since  each  film  was 
processed  at  (wo  radii  (there  are  37  flints  in  which  a  nodule  was  detected)  there  arc  74  possthtle  lilm/ratlms 
combinations  that  could  be  compared  in  the  above  analyses  Only  film/radios  combinations  in  whit  It  a 
nodule  was  detected  by  the  trained  system  are  included  in  the  comparasions  Thirty- four  film/radius 
combinations  (26  different  films)  were  evaluated  by  the  9-lraincd  ANDS  and  47  (32  diflcicni  films)  by  the 
37-trained  ANDS;  die  reported  metrics  ate  means  over  ihese  numbers  of  film/radius  combinaUons. 


Since  it  is  clinically  more  important  not  to  miss 'a  radiograph  that  contains  a  nodule  Hum 
to  recognize  every  nodule  in  a  film,  Table  4,8  summarizes  the  false  negative  rales  (in  lemis  of  films 
that  were  missed)  of  ANDS  at  the  three  trainings. 


TR  - 120 


False  Negative  Rates  For  Films 


ANDS  Training 

Single  Nodules 

Multiple  Nodules1 

All  1  ilms 

Naive2 

0.12 

0.00 

0.08 

9-trained 

0.35 

0.18 

0.30 

37-trained 

0.19 

0.00 

0.14 

1  •  The  average  number  of  nodules  per  film  is  7.5,  s=  11.4. 

2  -  The  top  50  points  in  the  accumulator  list  are  considered. 


Table  4.8  -  False  negative  tales  at  three  trainings,  ihe  false  negative  rates  are  snmm.ui/cd  at  the  three 
trainings  over  all  films  with  one  nodule,  all  films  with  more  than  one  nodule,  and  a!!  film-  with  any  nodule. 
Twenty-six  films  contain  a  single  nodule;  11  films  contain  multiple  nudules.  Tire  false  negative  rates  are 
summarized  over  all  films  with  one  nodule,  all  films  with  more  than  one  nodule,  and  all  films  with  any 
nodule. 


Table  4.9  summarizes  the  films  that  were  missed  by  the  naive  ANDS  uiul  winch  weie 
consequently  missed  by  the  trained  systems.  Table  4.10  summarizes  the  misdassificalions  ol  lice 
CNs  in  films  that  were  misdiagnosed  (that  is.  no  nodule  was  detected)  by  ANDS  at  both  trainings. 


Summary  of  Films  Thai  Were  Misdiagnosed  by  Naive  ANDS 

FILM  #  Description  of  Nodule 

24  faint  shadow  of  a  button 

on  lateral  bolder  of  right  lung 

42  pseudo-nodule;  near  bottom 
apex  of  right  lung;  elongated 
vertically;  well  defined  margins 
but  non-uniform  interior  density 

43  g.anuloma;  nodule  is  fuzzy; 

is  on  bottom  medial  border  of 
left  lung;  poorly  defined  margin; 
non-uniform  interior  density 


Table  4.9  -  The  solitary  abnormalities  in  these  three  films  were  not  detected  by  die  umijiucd  ANUS  at  i.„ln 
of  S  or  10  (pixels).  Since  the  nodules  in  diese  filmy  were  missed  by  die  untrained  system,  they  wcic  also 
missed  by  the  trained  systems  because  the  trained  systems  use  the  accumulator  list  that  is  output  by  die 
naive  system  as  their  input.  If  a  nodule  is  not  anywhere  in  die  list  of  accumulator  po.niy  that  is  produced 
by  the  naive  system  it  cannot  be  detected  by  a  trained  system. 


»2 
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Summary  of  Films  Thai  Were  Misdiagnosed  by  9-trained  ANDS 


Film  Radius 
2;  5 

6;  5 


7;  10 


20;  5 

35;5 

39;  5 


39;  10 


40;  5 


Comment  on  Classification 
laughi  as  medium  nodule,  classified  as 
rib;  nodule  is  fibrous  and  on  rib 

laughi  as  nodule  on  medial  border, 
classified  as  large  vascularity;  nodule 
is  well  defined;  interior  mass  is 
relatively  uniform  and  noticeably 
brighter  than  exterior 

taught  as  medium  nodule,  classified  as 
large  vascularity;  nodule  margin  is  well 
defined  and  interior  brightness  is 
uniform;  nodule  is  near  medial  border 
above  the  medial  midpoint;  diameter  is 
greater  than  2.0  cm. 

taught  as  medium  nodule,  classified  as 
rib;  >2  cm  diameter  nodule  is  occluded 
by  clavicle 

taught  as  medium  nodule,  classified  as 
rib;  nodule  is  occluded  by  rib;  margin  is 
fuz/y,  interior  is  uniform 

two  nodules  are  present  in  film; 

|157,  241]  taught  as  medium  nodule, 
classified  as  small  vascularity;  well 
defined  margin;  near  medial  and  bottom 
borders  of  right  lung; 

(371, 296|  taught  as  medium  nodule, 
classified  as  rib;  fuzzy,  darker  somewhat 
horizontally  elongated  in  left  lung 

two  nodules  are  present  in  film; 

[155,  246]  taught  as  medium  nodule, 
classified  as  small  vascularity;  (see  above) 
[372, 289]  taught  as  medium  nodule, 
not  classified  because  of  error  in 
computing  nodule  feature  statistics 

two  nodules  are  present  in  film; 

1132,231]  laughi  as  small  nodule  on 
border,  classified  as  small  vascularity; 
[367, 277]  not  taught  -  unable  to  compute 
nodule  siausucs 

(132,  231]  taught  as  small  nodule  on 
border,  classified  as  small  vascularity; 
|367,  277]  not  taught  •  unable  to  compute 
nodule  siausucs 


40;  10 


Summary  of  Films  That  Were  Misdiagnosed  by  9-trained  ANDS  (continued) 


Film  U\  Radius 

Comment  on  Classification 

44;  5 

taught  as  nipple,  classified  as  small 
vascularity;  nipple  is  near  mediastinum 

44;  10 

taught  as  nipple,  classified  as  rib;  no 
overlapping  rib 

Tablt  4.10  -  immary  of  ihe  films  that  v-iic 
10  those  in  'i  able  4.9  were  moved  by  9-tr;-.:'ii 
by  the  untrained  ANDS.  11..  right  cwunm  r.  |4 
Eleven  films  were  missed  by  9-trained  ANUS  (ol 


me  9-trained  ANUS.  The  jhtivc  fihtis  u.  jUvIiuuii 
.'lie  above  fiim/radios  combinations  were  detected 
y  die  nodule(s)  in  these  films  was/were  missed. 
37  films  dial  contain  nodules). 


Summary  of  Films  That  Were  Misdiagnosed  by  37-uaincd  ANDS 


Film  #\  Radius 

7;  10 

44  ;5. 10 


Comment  on  Classification 

taught  as  medium  nodule, 
classified  as  large  vascularity 

taught  as  nipple,  classified  as 
small  vasculamy 


Table  4.1 1  -  Summary  of  die  films  that  were  missed  by  the  37-trained  ANUS.  The  above  lilin,  hi  .idditnni 
to  those  in  Table  4.9  were  missed  by  37-trained  ANUS.  Hie  above  film/iadius  combinations  wetc  delected 
by  the  untrained  ANDS.  The  right  column  explains  why  the  nodule(s)  in  these  films  was  missed,  l  ive 
films  were  missed  by  37-trained  ANDS  (of  37  lilms  that  contain  nodules). 


Table  4.12  illustrates  that  number  of  CNs  from  the  top  of  the  list  of  repotted  candidate 
nodules  that  a  radiologist  must  inspect  before  being  95%  or  99%  confident  of  having  read  a  nodule 
that  was  detected  by  ANDS,  if  one  is  prest;.;  in  the  film.  These  values  are  the  upper  limits  of  die 
respective  confidence  limns  on  the  means  that  are  presented  in  Table  4.7.  Ihe  confidence  levels  aie 
presented  for  naive,  9-trained,  and  37-uained  ANDS.  The  values  for  die  naive  system  aie  basal  on 
the  results  for  47  films  in  which  nodules  were  detected  by  37-irained  ANDS. 
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ANDS  Training 
Naive 
9-trained 
37-trained 


Confidence  Level 
95%  99% 

11  12 
2  2 

3  4 


True  Positive  Kate 
All  Films 
.92 
.70 
.86 


Table  4.12  -  The  number  of  CN  sites  from  the  lop  of  the  list  of  CNs  which  a  radiologist  must  iu»|>cu  in 
order  to  be  95%  or  99%  confidence  of  having  read  a  nodule.  The  9-uaincd  limns  arc  based  a  mean  (hat  is 
obtained  from  34  film/radius  combination  and  the  37-uatncd  limits  detive  from  a  mean  that  is  obtained 
from  47  film/radius  combinations.  The  true  positive  rate  is  the  percentage  of  all  films  with  at  least  one 
nodule  that  was  coriectly  diagnosed  by  ANDS. 


5  -  DISCUSSION 


Panoramic  View 

As  a  result  of  this  work,  an  Automated  Nodule  Detection  System,  based  on  a  system  presented  by 
Ballard  (Ballard,  1974),  was  designed,  tuned,  and  tested  on  a  database  of  43  chest  radiographs.  I  he 
reproduction  of  the  original  radiographs  to  digital  images-  was  carefully  determined  and  controlled 
so  that  a  linear  transfer  was  obtained,  The  final  ANDS  design  was  chosen  from  four  candidate 
system  configurations.  An  ANOVA  failed  to  find  any  significant  difference  in  nodule  detection 
ability  between  the  tested  configurations,  when  evaluated  with  six  films,  this  failure  was  primarily 
due  to  a  large  film-to-film  variation  which  masked  any  difference  due  to  processing  configuration. 
The  configuration  that  was  implemented  was  chosen  primarily  because  it  requires  one  less  step  (and 
is  consequently  faster)  than  the  top-performing  configurauon  and  because  its  performance  measure 
is  within  4%  of  that  configurauon.  'Ihe  parameters  of  ANDS  (knot  spacing,  radius,  and  image 
resolution)  that  result  in  optimal  detection  of  nodules  in  five  films  were  determined  when  tuning 
ANDS.  The  knot  spacing  value,  the  parameter  of  the  spline  filter,  was  found  to  have  no  statistically 
significant  effect  on  the  detection  of  the  nodules  of  various  sizes  in  the  live  films  that  weie 
evaluated.  A  knot  spacing  of  60  pixels  was  chosen  because  this  results  in  the  highest  mean 
detection  performance  over  all  five  films  tested;  this  value  is  not  statisucally  significant  at  (a =0.05). 
The  amount  of  variation  contributed  by  the  radius  value  that  is  used  by  the  CN  lixpeil  was  shown 
to  be  statistically  significant  (a  =  0.05)  and  to  have  a  statisucally  significant  interaction  with  film 
(that  is,  nodule  size).  Two  radius  values  were  chosen  and  are  implemented  in  ANDS.  These  radius 
values  are  5  and  10  pixels.  A  radius  value  of  5  pixels  was  chosen  because  the  detectoi  performance 
on  all  five  films  was  the  highest  at  this  value,  although  die  performance  at  tins  value  is  not 
statistically  different  from  those  of  the  oilier  tested  values.  A  radius  value  of  10  pixels  was  chosen 
because  it  corresponds  to  a  2  centimeter  (diameter)  nodule.  The  image  resolution  Unit  was  chosen 
for  incorporation  in  ANDS  is;  rescaling  of  the  original  image  by  a  factor  of  2.  Thai  is.  each 
dimension  of  the  original  digital  image  is  reduced  by  half.  A  reduction  in  high  frequency  linage 
noise  may  be  the  cause  of  the  improved  detection  of  nodules  in  half-size  images.  This  noise  could 
also  have  been  reduced  by  averaging  multiple  scannings  of  the  image.  Presumably,  the  noise  is 
random  and  is  a  result  of  digitization. 


Performance  of  the  Kxperts 

The  performance  of  this  ANDS  was  assesses:  on  the  entire  database.  A  Nodule  iixpeit  (a  pattern 
classifier  with  a  set  of  classification  rules)  was  trained  twice,  first  widi  9  films  and  dien  with  37  films 
(all  of  these  films  contained  at  least  one  nodule).  The  Nodule  I-lxpert  detected  76%  of  all  known 


nodules  in  the  enure  database  when  37-uained  and  detected  only  58%  when  9  uaincd.  I  lus 
difference  is  presumably  due  to  the  more  extensive  naming.  The  contribution  of  she  Uib  I  Xpert 
may  be  judged  significant  because  us  output  value  (whether  or  not  a  rib  was  detected)  is  used  by 
the  pattern  recognizer.  The  output  of  the  Vascularity  Expert,  however,  is  not  used  by  the  pattern 
recognizer.  Presumably,  this  may  be  attributed  to  the  strength  of  the  features  dial  are  used  by  the 
pattern  classifier,  rather  than  to  a  weakness  of  the  Expert.  That  is,  the  features  may  provide  a  more 
concise  measure/description  of  vascularity  than  does  the  Expert.  The  Vascularity  Expert  essentially 
adds  knowledge  about  linear  clustering  of  CNs;  this  may  not  be  necessary  to  recognize  vascularity. 
Knowledge  about  the  CN's  appearance  and  relative  locauon  in  die  lung  is  perhaps  more  peitinenl 

Compaiasinn  of  various  trainings 

Initially,  I  had  planned  to  compare  only  ihe  naive  \NL)S  and  9-trained  ANUS.  Since  1 1  lilins  (ui 
37  films  that  contain  at  least  one  nodule)  were  missed  by  the  9-trained  system,  the  37  trained  system 
was  developed  and  tested.  F  ve  films  were  missed  by  the  37-trained  ANDS.  Thai  is,  three  films  in 
addition  to  the  two  films  v  1  ,h  were  missed  bv  (he  naive  system  were  missed  by  37-trained  ANDS. 
Both  trained  systems  proved  capable  of  k  unbei  of  faLse  positives  (Table  4.7)  at  a 

statistically  significant  level  (P  «  O.O005).  As  <;  v  jiaequeiice  of  reducing  the  false  positive  rale,  die 
average  DM  for  all  films  tested  by  both  trained  systems  was  also  significantly  icdueed;  this  is 
desirable.  However,  a  statist  illy  significant  (P  <  0.005)  decrease  in  die  true  positive  rate  (the 
fraction  of  all  known  nodule;  that  ai  •  1)  also  accompanied  the  rcducuon  in  false  positives; 

this  may  be  tolerable,  although  not  end;..  ’..Lie.  The  true  posiuve  late  (in  tcuns  of  noduies)  for 
the  naive  system,  0.875,  is  reduced  to  0.787  for  37-uained  ANDS;  this  is  an  11%  reduction.  Hie 
true  positive  rate  for  films  decreased  only  6.5%  from  92%  (foT  the  naive  system)  to  86%  (for  3" 
trained  ANDS).  This  suggests  that  more  nodules  in  films  containing  multiple  nodules  aie  being 
missed  by  the  37-trained  system.  That  is,  it  seems  that  die  discrepency  in  the  Hue  positive  rates  is 
not  due  to  missed  nodules  in  films  dial  contain  only  single  nodules. 

The  trade-off  between  true  and  false  positives 

One  is  faced  with  a  uade-off  when  one  desires  to  reduce  the  number  of  false  positives  (oi  the 
number  of  CN  sites  that  the  radiologist  must  inspect);  this  trade-off  is  between  die  number  of  false 
positives  and  the  true  positive  rale.  When  one  desues  fewer  false  positives  one  must  consequently 
accept  the  possibility  of  detecting  fewer  nodules.  Of  couise  the  dcletuon  tales  of  die  system  may 
be  improved  by  furdier  training,  but  how  much  improvement  can  be  gained  and  bow  much  tunning 
would  be  required  is  not  known.  A  system  with  lower  false  posiuve  and  higher  Hue  posiuve  tales 
may  be  possible. 


6  -  Conclusion 


Pattern  recognition  techniques  and  procedural  driven  image  experts  are  capable  ol  reducing  lire 
number  of  CN  sites  that  a  radiologist  must  inspect  from  at  most  12  to  at  most  4  in  order  to  be  99% 
confident  of  having  inspected  any  noduiefs)  delected  by  37-trained  ANUS.  The  radiologist  must  be 
willing  to  accept  a  film  true  positive  rate  of  88%  (as  opposed  to  a  film  true  posiuve  rate  of  92%)  for 
the  convenience  of  having  fewer  points  to  inspect.  These  film  true  posiuve  rates  are  derived  from 
37  films  which  contain  nodules  that  were  evaluated  by  ANDS. 
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7  -  Future  Work 


Train  ANDS  with  more  films  and  evaluate  results 

Table  4.9  illustrates  the  effect  of  training  ANDS  with  9  and  with  37  films.  The  film  false  negative 
rate  decreases  from  30%  to  14%  when  ANDS  is  trained  with  more  films.  I  believe  that  ANDS  tan 
be  made  more  effective  if  it  is  trained  with  more  (about  100)  films. 

Implement  parts  of  ANDS  in  VLSI  hardware 

The  spline  filter,  histogram  equalization,  circle  detector,  convolution,  image  scutch,  fcuiiuc 
compulation,  and  pattern  recognition  phases  of  ANDS  may  be  implemented  in  hardwaie  fur  added 
speed  of  execution. 


Compare  ANDS  with  radiologists 

Time  has  not  allowed  the  completion  of  a  comparasion  between  radiologists,  who  aie  insliiiued  to 
find  all  nodules  in  a  subset  of  ANDS  films.  The  radiologists  are  also  instructed  to  talc  their 
confidence  that  each  is  a  nodule.  These  results  will  be  reported  at  a  later  Ume. 
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Appendix  9.1  -  Calibration  of  Optronics  Scanner 


The  Optronics  C-4100  scanner  was  calibrated  according  lu  llic  maintenance  manual 
[Optronics,  date  unknown]  using  Kodak  ncuual  density  filters  and  die  #2  step  tablet.  Despite  tins 
.calibration,  few  steps  above  2.00  on  the  step  tablet  were  evident  in  the  digitized  image.  A  piece  of 
film  (Kodak  Commercial)  with  a  density  of  2.0  and  the  2.0  neutral  density  filter  were  scanned  while 
the  current  from  the  photomultiplier  was  measured.  The  measured  currents  for  bodi  films  were  not 
the  same.  This  difference  is  attributed  to  the  transmision  characteristics  of  tire  objects  that  weie 
measured.  The  density  of  the  film  is  due  to  silver  filaments  that  arc  suspended  in  die  gelauu 
matrix,  while  the  density  of  the  ND  filter  is  due  to  carbon  particles  in  a  gclaun  matrix.  This 
discrepency  is  due  to  a  higher  Callier  coefficient  for  the  film  |Jamcs,  pp.  488  489|.  The 
illumination/collection  geometry  of  the  scanner  is  dial  of  a  microdensitomcler.  The  light  (hat  is 
scattered  by  the  larger  grains  in  the  film  is  never  collected  by  the  microdensitomcler.  No  light  is 
lost  due  to  scattering  by  the  ND  filter;  its  Callier  factor  is  very  near  1.0. 

Thus,  the  scanner  was  calibrated  using  the  step  tablet,  whose  density  chaiacleiisties  mine 
closely  approximate  those  of  the  reduced  radiograph  images  that  are  digitized.  The  scannei  was 
calibrated  to  provide  the  optimum  discnminauon  between  two  steps  (densities  about  2.6-2.73). 
These  two  steps  were  digitized  and  their  digital  values  were  compared  (in  the  final  image)  and 
potentiometer  R52  was  adjusted  to  obtain  a  maximum  difference  between  these  steps. 


Appendix  9.2  -  Compulation  of  ANOVA:  Processing  Methods 


This  ANOVA  compares  four  image  processing  configurauons  liiat  were  usal  to  delecl 
nodules  in  five  chest  radiographs.  The  model  is: 

Yy  =  rj  +  Fj  +  Pj  +  Ejj 

where: 

Yjj  =  observed  mean 

ij  =  general  mean 

F;  =  effects  due  to  films 

Pj  =  effects  due  to  processing  methods 

ey  =  effects  due  to  error 

Four  measures  (true  positive  rate.  TP;  false  positive  rate1.  FP;  and  a  hislogiam  mettle’, 
QHM)  were  combined  in  a  weighted  average  for  each  film/processing  combination  to  obtain  the 
values  that  are  used  in  the  ANOVA.  The  weighted  average  is: 

metric  =  .5TP  +  .3(1  -  FP‘).+  .2QUM 

lrThe  false  positive  rate  that  was  used  when  computing  this  ANOVA  is  diffcicm  lioin  the  one 
described  earlier  in  this  thesis.  It  is  defined  as  the  ratio  of  die  number  of  non-nodules  that  lie 
between  the  first  accumulator  point  and  the  last  detected  nodule  and  the  total  number  of  nodules  in 
the  accumulator  list. 


2QHM  is  a  histogram  metric  that  is  only  used  in  this  ANOVA.  It  is  defined  as: 


where: 


QHM  =  £(1-(1/Q)q)h|q] 

4  =  0 

h(]  =  a  histogram  with  Q  entries;  each  entry  represents  die  peicentagc  ol  detected 
nodules  that  were  located  in  a  given  Q-ilc  in  the  list  of  CNs 
Q  =  the  number  of  quantiles,  equal  size  divisions  of  the  list  of  CN  . 
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1 

10 

10 

10 

20 

carcinoma 

NC 

51 

M 

1 

B 

- 

It 

21 

granuloma 

0.44 

53 

F 

.  1  + 

B 

5 

10 

22 

granuloma 

0.49 

56 

M 

1 

NC 

23 

granuloma 

0.83 

63 

F 

1 

B 

11 

11 

24 

button 

NC 

44 

M 

1 

- 

- 

25 

nipples 

0.56 

76 

M 

2 

B 

1) 

11 

26 

met(kidncy) 

0.68 

50 

M 

1 

5 

5 

5 

27 

mel(melanoma) 

0.58 

21 

F 

22 

B 

11 

11 

28 

granuloma 

0.83 

50 

1 

1 

NC 

29 

carcimoma 

NC 

69 

F 

1 

B 

10 

11 

30 

carcinoma 

0.88 

55 

F 

1 

10 

10 

10 

31 

carcinoma 

0.61 

52 

F 

1 

B 

NC 

32 

met(salivary) 

0.53 

64 

F 

37 

B 

1) 

B 

33 

met(breast) 

0.94 

39 

1 

1 

NC 

34 

met(breast) 

0.61 

54 

F 

5 

B 

5 

It 

35 

carcinoma 

0.99 

64 

M 

1 

B 

- 

11 

36 

carcinoma 

0.69 

63 

M 

1 

B 

5 

5 

37 

hamartoma 

NC 

64 

M 

1 

B 

5 

5 

38 

carcinoma 

1.35 

57 

F 

1 

NC 

39) 

1 

pseudo-nodule 

0.61 

51 

M 

2 

B 

- 

5 

40| 

2 

pseudo-nodule 

0.48 

53 

M 

2 

B 

- 

5 

411 

3 

pseudo*nodulc 

0.48 

55 

M 

2 

11 

Nt: 

42 

pseudo-nodule 

0.42 

48 

M 

1 

- 

- 

43 

granuloma 

1.27 

45 

F 

1 

- 

- 

44 

nipple 

0.49 

67 

F 

1 

11 

45 

none 

29 

F 

46 

none 

35 

F 

47 

none 

64 

M 

48 

none 

77 

M 

49 

none 

30 

M 

50 

none 

56 

M 

No  results  were  obtainable  from  entries  in  boldface  bemuse  of  digiti/jtiun  errors  (dion>ni  M.uiliuct)  m  .111.1.I1..I  . . 


files.  Filins  #38  was  omitted  because  nodule  is  always  found  because  it  is  in  Iowa  medial  mine.'  «<f  tight  lung 

*  Successful  radii  =  radii  at  which  at  least  one  of  the  nodules  present  in  the  films  was  detected:  ll  =  hntli  taditis  5  and 
10  (pixels);  5  =  5  pixels  (.5  cm);  10=10  pixels  (1.0  cm.).  All  films  wre  processed  by  ANDS  at  two  radii  5  and  II)  pixels 

NC  =  Not  Computed:  the  data  were  not  computed  because  of  erior  in  compttung  nodule  statistics  (radius  data)  oi 
because  of  scanline  error  (could  not  compute  lung  boundaries). 

|n  =  Adjacent  film  numbers  that  are  followed  by  a  |  are  part  of  a  series.  The  number,  n.  after  die  |  indicates  the 
chronoligica!  position  of  that  film  in  the  series 


